NextSteamGame is a Steam recommendation project built around the idea that games
should be matched by what they are, not only by player-overlap signals.
Most recommendation systems rely heavily on player-overlap data:
Players who liked X also liked Y.
That works well for popular games, but it often struggles with niche preferences and rarely explains why two games are similar.
For example:
- Someone may enjoy Persona 5 because of its jazz fusion soundtrack and modern Tokyo setting.
- Another player may enjoy Persona 5 because of its social simulation and dungeon crawling.
Most recommendation systems treat those users identically.
NextSteamGame attempts to separate those signals and lets users directly control what aspects of a game matter most.
The project has three main layers:
- a metadata pipeline that builds and enriches
steam_metadata.db - a review/semantics pipeline that builds
steam_initial_noncanon.dbandsteam_final_canon.db - a live app stack that serves recommendations through FastAPI + a React frontend backed by Postgres
Current runtime application:
I wanted a recommendation system that could answer:
Why is this game being recommended?
Traditional recommenders often know that two games are related but cannot clearly explain the connection.
For example:
- Star Wars and Lord of the Rings may both be recommended because of a hero's journey.
- Two players may enjoy the same game for completely different reasons.
- A niche mechanic may be more important than the genre itself.
The goal of NextSteamGame is to build recommendations around a game's semantic identity rather than only player behavior.
The current database contains roughly:
- 80,000 Steam games
- up to 2,000 reviews per game
- semantic vectors
- identity tags
- canonicalized genre and tag relationships
The first stage creates a local metadata database using Steam's APIs and SteamSpy.
This includes:
- appids
- genres
- tags
- descriptions
- release information
- storefront artwork
Output:
steam_metadata.db
For each game, the pipeline collects up to:
2,000 reviews
Reviews are processed through several filtering stages:
- regex spam removal
- review quality scoring
- word diversity scoring
- insightful phrase detection
- review ranking heuristics
The goal is to prioritize reviews that actually explain the game rather than memes or one-line comments.
Reviews are then classified with ModernBERT into categories such as:
- gameplay explanations
- artistic discussion
- soundtrack discussion
- systems depth
- general descriptive reviews
This gives the pipeline separate review pools focused on different aspects of a game.
The highest quality review candidates are then passed into an LLM extraction pipeline.
This stage generates:
- mechanics
- narrative
- vibe
- structure_loop
- signature tags
- niche anchors
- identity tags
- music tags
- micro-tags
The goal is to capture details often missing from traditional tags.
For example, many players describe PlateUp!'s late-game automation systems as the most important part of the experience despite the game primarily being categorized as a cooperative cooking game.
Generated tags frequently describe the same concept using different wording.
For example:
Fast Action
Quick Action
High-Speed Combat
All represent nearly identical ideas.
To solve this, I built a separate canonicalization pipeline using:
- heuristics
- fuzzy matching
- embedding similarity
- vector search
This groups semantically similar tags together while preserving niche distinctions.
Output:
steam_final_canon.db
Computing similarity between every game at runtime would be expensive.
Instead, NextSteamGame precomputes candidate relationships offline.
When a user searches:
- candidate games are retrieved
- user weighting is applied
- recommendations are reranked
This keeps the live application extremely cheap while still allowing real-time customization.
- backend:
FastAPI - frontend:
Next.js/ React - runtime game store:
Postgres - retrieval target: local
Chroma - upstream build artifacts:
SQLite
The app flow is:
- search for a Steam game
- open it as the reference profile
- inspect and adjust its focus vectors, identity tags, genres, and appeal axes
- rerank recommendations from the game's semantic profile