Mentor-Mentee Matching System

Assignment Context

This project was implemented as part of a take-home technical assignment. All data used is synthetic CSV data created solely for demonstration purposes.

The core objective is to match mentees to mentors based on:

Research domain compatibility
Subdomain similarity

The system is designed to be deterministic, locally executable, and easy to evaluate. No API keys or external services are required to run this project.

System Architecture

The project follows a modular architecture separating the matching logic from the presentation layer:

Backend (Python 3.10+ / FastAPI): Handles CSV parsing, data validation, and core matching algorithms. Uses scikit-learn for semantic text analysis.
Frontend (Next.js / TypeScript): Provides a responsive interface for users to upload datasets and visualize matching results with confidence breakdowns.
CLI Utility: Enables batch processing of matching tasks.

The API and frontend components are optional enhancements built to demonstrate how the core matching logic can be exposed and visualized.

Algorithmic Approach

The matching logic follows the assignment-specified weighted scoring system (0–100 scale) to ensure transparent and reproducible results. The total confidence score is derived from two components:

1. Primary Domain Matching (70 points)

The system first evaluates the high-level research_domain field.

Exact Match: Awards 70 points.
Mismatch: Awards 0 points.

This strict filtering prioritizes matches fundamentally grounded in the same field of study, as per the assignment requirements.

2. Subdomain Similarity (Up to 30 points)

Refines the match based on specific areas of interest (subdomain). The scoring cascades through the following logic:

Exact Match (30 pts): Identical strings (case-insensitive).
Containment (24 pts): One term is a substring of the other (e.g., "Vision" in "Computer Vision").
Predefined Related Subdomains (21 pts): Uses mappings to capture common academic overlaps (e.g., NLP ↔ LLMs, Cryptography ↔ Network Security).
Shared Categories (15 pts): Both subdomains map to a common parent category.
Semantic Similarity (Variable): If no direct relationship is found, the system uses TF-IDF (Term Frequency-Inverse Document Frequency) and Cosine Similarity to calculate a vector distance between terms. This allows detection of related concepts even with different terminology.

Output Format

The system produces a deterministic output file (JSON or CSV) containing the following key fields:

mentee_name: Name of the student.
matched_mentor: Name of the assigned mentor.
confidence_score: A numeric score (0-100) indicating match quality.
match_reason: A human-readable explanation of why the match was made.

Example CSV Output

mentee_name,matched_mentor,confidence_score,match_reason
Aanya N,Dr. Sharma,95,"Same domain (AI) and same subdomain (Computer Vision)"

Prerequisites

Python 3.10 or higher
Node.js 18 or higher (for the optional frontend)
npm or yarn

Installation and Execution

Backend API & CLI

The backend handles the core logic.

Navigate to the backend directory:
```
cd backend
```
Install Python dependencies:
```
pip install -r requirements.txt
```
Run the CLI (Recommended for evaluation): This allows you to match CSV files directly from the terminal.
```
python cli.py --mentees ../sample_mentees.csv --mentors ../sample_mentors.csv --output results.json
```
Arguments:
- --mentees: Path to the mentees CSV file (Required)
- --mentors: Path to the mentors CSV file (Required)
- --output: Destination path for the result file (Default: output.json)
- --format: Output format, either json or csv (Default: json)
(Optional) Run the API Server:
```
python main.py
```
The API will be available at http://localhost:8000.

Frontend Application (Optional)

The frontend provides a graphical user interface for the system.

Navigate to the root directory (containing package.json).
Install dependencies:
```
npm install
```
Start the development server:
```
npm run dev
```
Access the application at http://localhost:3000.

API Documentation (Optional)

POST /match

Accepts JSON payloads of mentees and mentors to process matches.

Request Body:

{
  "mentees": [{"name": "Student A", "research_domain": "AI", "subdomain": "NLP", ...}],
  "mentors": [{"name": "Prof B", "research_domain": "AI", "subdomain": "LLMs", ...}]
}

POST /match/csv

Accepts multipart/form-data uploads of CSV files.

Design Decisions

Deterministic Logic: The matching service is designed to be deterministic, ensuring that the same input always produces the exact same output.
Local Text Analysis: TF-IDF models are fitted locally to the provided batch. This ensures semantic matching works robustly on the specific vocabulary of the input dataset without external dependencies.
Type Safety: Pydantic models (Backend) and TypeScript interfaces (Frontend) are strictly enforced to minimize runtime errors and ensure data integrity.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
backend		backend
components		components
hooks		hooks
lib		lib
public		public
styles		styles
.gitignore		.gitignore
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
sample_mentees.csv		sample_mentees.csv
sample_mentors.csv		sample_mentors.csv
sample_output.csv		sample_output.csv
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mentor-Mentee Matching System

Assignment Context

System Architecture

Algorithmic Approach

1. Primary Domain Matching (70 points)

2. Subdomain Similarity (Up to 30 points)

Output Format

Example CSV Output

Prerequisites

Installation and Execution

Backend API & CLI

Frontend Application (Optional)

API Documentation (Optional)

POST /match

POST /match/csv

Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mentor-Mentee Matching System

Assignment Context

System Architecture

Algorithmic Approach

1. Primary Domain Matching (70 points)

2. Subdomain Similarity (Up to 30 points)

Output Format

Example CSV Output

Prerequisites

Installation and Execution

Backend API & CLI

Frontend Application (Optional)

API Documentation (Optional)

POST /match

POST /match/csv

Design Decisions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages