Author: Maxence Mirosavic
Date: June, 2026
This project studies horror fiction through large-scale annotation of literary texts using Large Language Models (LLMs).
The corpus includes approximately one hundred novels, including a substantial collection of Stephen King's works, allowing both cross-author comparisons and diachronic analyses of King's production.
The objective is to identify and quantify narrative features associated with horror, including:
- threats
- characters
- setting characteristics
- narrative perspective
- emotional and cognitive dimensions of fear
Annotations are produced automatically using OpenAI language models and are stored as structured datasets for further statistical analysis.
data/
raw corpus and annotations outputs
lib/
utility modules
batch processing tools
prompts/
annotation prompts
notebooks/
exploratory analyses
figures/
generated plots and visualizations
The annotation process is performed in multiple waves.
Detection and extraction of:
- threats
- character lists Plus setting-level annotations:
- setting hostility
- setting hideability
- ...
Threat-level annotations:
- threat salience
- temporal immediacy
- resemblance to existing predators (sharp teeth, claws...)
- ...
Character-level annotations:
- character centrality
- vulnerability
- agency
- ...
Character annotations are performed on an exploded dataframe containing one row per (chunk, character) pair before being merged back into a chunk-level dataset.
The project uses a custom Batch class built on the OpenAI Batch API.
Main features:
- dataframe-based workflow
- automatic prompt templating
- unique request identifiers
- automatic parsing of model outputs
- score extraction
- integration of results back into pandas dataframes
Typical usage:
batch = Batch(df, client, variables, questions, ('book_index', 'chunk_index')) batch.build_requests() batch.submit() batch.parse_results() batch.export('results.csv')
Main Python packages:
- pandas
- openai
- numpy
- matplotlib
The primary unit of analysis is the text chunk of approximately 300 OpenAI-compatible tokens. It equals roughly to a double-spaced written page (according to the numbers given by OpenAI) and allow for a temporal analysis of the evolution of various features within the novel.
Each chunk is identified by:
book_index: the index of the book within the corpuschunk_index: the index of the chunk within the book
Character-level annotations additionally use;
character_rank: the index of the character within the list given back by the LLM to preserve the original order of characters within a chunk.
Project developed as part of a research internship on horror fiction analysis.