Skip to content

maxxencem/HorrorFictionAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Horror Fiction Analysis

Author: Maxence Mirosavic

Date: June, 2026

Overview

This project studies horror fiction through large-scale annotation of literary texts using Large Language Models (LLMs).

The corpus includes approximately one hundred novels, including a substantial collection of Stephen King's works, allowing both cross-author comparisons and diachronic analyses of King's production.

The objective is to identify and quantify narrative features associated with horror, including:

  • threats
  • characters
  • setting characteristics
  • narrative perspective
  • emotional and cognitive dimensions of fear

Annotations are produced automatically using OpenAI language models and are stored as structured datasets for further statistical analysis.

Repository structure

data/ raw corpus and annotations outputs

lib/ utility modules batch processing tools

prompts/ annotation prompts

notebooks/ exploratory analyses

figures/ generated plots and visualizations

Annotation Pipeline

The annotation process is performed in multiple waves.

Wave 1

Detection and extraction of:

  • threats
  • character lists Plus setting-level annotations:
  • setting hostility
  • setting hideability
  • ...

Wave 2

Threat-level annotations:

  • threat salience
  • temporal immediacy
  • resemblance to existing predators (sharp teeth, claws...)
  • ...

Wave 3

Character-level annotations:

  • character centrality
  • vulnerability
  • agency
  • ...

Character annotations are performed on an exploded dataframe containing one row per (chunk, character) pair before being merged back into a chunk-level dataset.

Batch Annotation System

The project uses a custom Batch class built on the OpenAI Batch API.

Main features:

  • dataframe-based workflow
  • automatic prompt templating
  • unique request identifiers
  • automatic parsing of model outputs
  • score extraction
  • integration of results back into pandas dataframes

Typical usage: batch = Batch(df, client, variables, questions, ('book_index', 'chunk_index')) batch.build_requests() batch.submit() batch.parse_results() batch.export('results.csv')

Dependencies

Main Python packages:

  • pandas
  • openai
  • numpy
  • matplotlib

Data Format

The primary unit of analysis is the text chunk of approximately 300 OpenAI-compatible tokens. It equals roughly to a double-spaced written page (according to the numbers given by OpenAI) and allow for a temporal analysis of the evolution of various features within the novel.

Each chunk is identified by:

  • book_index: the index of the book within the corpus
  • chunk_index: the index of the chunk within the book

Character-level annotations additionally use;

  • character_rank: the index of the character within the list given back by the LLM to preserve the original order of characters within a chunk.

Authors

Project developed as part of a research internship on horror fiction analysis.

About

My internship annotation loop to analyze a set of horror fiction novels (including ~60 Stephen King novels and ~50 canonical horror novels, with works by H.P. Lovecraft, Ann Radcliffe and Edgar Allan Poe)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors