NextSteamGame

NextSteamGame is a Steam recommendation project built around the idea that games should be matched by what they are, not only by player-overlap signals.

Most recommendation systems rely heavily on player-overlap data:

Players who liked X also liked Y.

That works well for popular games, but it often struggles with niche preferences and rarely explains why two games are similar.

For example:

Someone may enjoy Persona 5 because of its jazz fusion soundtrack and modern Tokyo setting.
Another player may enjoy Persona 5 because of its social simulation and dungeon crawling.

Most recommendation systems treat those users identically.

NextSteamGame attempts to separate those signals and lets users directly control what aspects of a game matter most.

The project has three main layers:

a metadata pipeline that builds and enriches steam_metadata.db
a review/semantics pipeline that builds steam_initial_noncanon.db and steam_final_canon.db
a live app stack that serves recommendations through FastAPI + a React frontend backed by Postgres

App Preview

Current runtime application:

How Recommendations Work

I wanted a recommendation system that could answer:

Why is this game being recommended?

Traditional recommenders often know that two games are related but cannot clearly explain the connection.

For example:

Star Wars and Lord of the Rings may both be recommended because of a hero's journey.
Two players may enjoy the same game for completely different reasons.
A niche mechanic may be more important than the genre itself.

The goal of NextSteamGame is to build recommendations around a game's semantic identity rather than only player behavior.

The current database contains roughly:

80,000 Steam games
up to 2,000 reviews per game
semantic vectors
identity tags
canonicalized genre and tag relationships

Stage 1: Metadata Collection

The first stage creates a local metadata database using Steam's APIs and SteamSpy.

This includes:

appids
genres
tags
descriptions
release information
storefront artwork

Output:

steam_metadata.db

Stage 2: Review Collection & Filtering

For each game, the pipeline collects up to:

2,000 reviews

Reviews are processed through several filtering stages:

regex spam removal
review quality scoring
word diversity scoring
insightful phrase detection
review ranking heuristics

The goal is to prioritize reviews that actually explain the game rather than memes or one-line comments.

Reviews are then classified with ModernBERT into categories such as:

gameplay explanations
artistic discussion
soundtrack discussion
systems depth
general descriptive reviews

This gives the pipeline separate review pools focused on different aspects of a game.

Stage 3: Semantic Tag & Vector Generation

The highest quality review candidates are then passed into an LLM extraction pipeline.

This stage generates:

Focus Vectors

mechanics
narrative
vibe
structure_loop

Identity Metadata

signature tags
niche anchors
identity tags
music tags
micro-tags

The goal is to capture details often missing from traditional tags.

For example, many players describe PlateUp!'s late-game automation systems as the most important part of the experience despite the game primarily being categorized as a cooperative cooking game.

Stage 4: Canonical Tag Mapping

Generated tags frequently describe the same concept using different wording.

For example:

Fast Action
Quick Action
High-Speed Combat

All represent nearly identical ideas.

To solve this, I built a separate canonicalization pipeline using:

heuristics
fuzzy matching
embedding similarity
vector search

This groups semantically similar tags together while preserving niche distinctions.

Output:

steam_final_canon.db

Stage 5: Retrieval Optimization

Computing similarity between every game at runtime would be expensive.

Instead, NextSteamGame precomputes candidate relationships offline.

When a user searches:

candidate games are retrieved
user weighting is applied
recommendations are reranked

This keeps the live application extremely cheap while still allowing real-time customization.

Pipeline

backend: FastAPI
frontend: Next.js / React
runtime game store: Postgres
retrieval target: local Chroma
upstream build artifacts: SQLite

The app flow is:

search for a Steam game
open it as the reference profile
inspect and adjust its focus vectors, identity tags, genres, and appeal axes
rerank recommendations from the game's semantic profile

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.github		.github
backend		backend
db_creation		db_creation
frontend		frontend
scripts		scripts
tests/recommendation_probe		tests/recommendation_probe
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
README.md		README.md
app.py		app.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
requirements.docker.txt		requirements.docker.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NextSteamGame

App Preview

How Recommendations Work

Stage 1: Metadata Collection

Stage 2: Review Collection & Filtering

Stage 3: Semantic Tag & Vector Generation

Focus Vectors

Identity Metadata

Stage 4: Canonical Tag Mapping

Stage 5: Retrieval Optimization

Pipeline

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NextSteamGame

App Preview

How Recommendations Work

Stage 1: Metadata Collection

Stage 2: Review Collection & Filtering

Stage 3: Semantic Tag & Vector Generation

Focus Vectors

Identity Metadata

Stage 4: Canonical Tag Mapping

Stage 5: Retrieval Optimization

Pipeline

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages