Horror Fiction Analysis

Author: Maxence Mirosavic

Date: June, 2026

Overview

This project studies horror fiction through large-scale annotation of literary texts using Large Language Models (LLMs).

The corpus includes approximately one hundred novels, including a substantial collection of Stephen King's works, allowing both cross-author comparisons and diachronic analyses of King's production.

The objective is to identify and quantify narrative features associated with horror, including:

threats
characters
setting characteristics
narrative perspective
emotional and cognitive dimensions of fear

Annotations are produced automatically using OpenAI language models and are stored as structured datasets for further statistical analysis.

Repository structure

data/ raw corpus and annotations outputs

lib/ utility modules batch processing tools

prompts/ annotation prompts

notebooks/ exploratory analyses

figures/ generated plots and visualizations

Annotation Pipeline

The annotation process is performed in multiple waves.

Wave 1

Detection and extraction of:

threats
character lists Plus setting-level annotations:
setting hostility
setting hideability
...

Wave 2

Threat-level annotations:

threat salience
temporal immediacy
resemblance to existing predators (sharp teeth, claws...)
...

Wave 3

Character-level annotations:

character centrality
vulnerability
agency
...

Character annotations are performed on an exploded dataframe containing one row per (chunk, character) pair before being merged back into a chunk-level dataset.

Batch Annotation System

The project uses a custom Batch class built on the OpenAI Batch API.

Main features:

dataframe-based workflow
automatic prompt templating
unique request identifiers
automatic parsing of model outputs
score extraction
integration of results back into pandas dataframes

Typical usage: batch = Batch(df, client, variables, questions, ('book_index', 'chunk_index')) batch.build_requests() batch.submit() batch.parse_results() batch.export('results.csv')

Dependencies

Main Python packages:

pandas
openai
numpy
matplotlib

Data Format

The primary unit of analysis is the text chunk of approximately 300 OpenAI-compatible tokens. It equals roughly to a double-spaced written page (according to the numbers given by OpenAI) and allow for a temporal analysis of the evolution of various features within the novel.

Each chunk is identified by:

book_index: the index of the book within the corpus
chunk_index: the index of the chunk within the book

Character-level annotations additionally use;

character_rank: the index of the character within the list given back by the LLM to preserve the original order of characters within a chunk.

Authors

Project developed as part of a research internship on horror fiction analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
lib		lib
3rchunks.csv		3rchunks.csv
README.md		README.md
main.ipynb		main.ipynb
prompts.csv		prompts.csv
rchunks-results.csv		rchunks-results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Horror Fiction Analysis

Overview

Repository structure

Annotation Pipeline

Wave 1

Wave 2

Wave 3

Batch Annotation System

Dependencies

Data Format

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Horror Fiction Analysis

Overview

Repository structure

Annotation Pipeline

Wave 1

Wave 2

Wave 3

Batch Annotation System

Dependencies

Data Format

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages