Skip to content

aneeshsrinivas/MentorMatch

Repository files navigation

Mentor-Mentee Matching System

Assignment Context

This project was implemented as part of a take-home technical assignment. All data used is synthetic CSV data created solely for demonstration purposes.

The core objective is to match mentees to mentors based on:

  • Research domain compatibility
  • Subdomain similarity

The system is designed to be deterministic, locally executable, and easy to evaluate. No API keys or external services are required to run this project.

System Architecture

The project follows a modular architecture separating the matching logic from the presentation layer:

  • Backend (Python 3.10+ / FastAPI): Handles CSV parsing, data validation, and core matching algorithms. Uses scikit-learn for semantic text analysis.
  • Frontend (Next.js / TypeScript): Provides a responsive interface for users to upload datasets and visualize matching results with confidence breakdowns.
  • CLI Utility: Enables batch processing of matching tasks.

The API and frontend components are optional enhancements built to demonstrate how the core matching logic can be exposed and visualized.

Algorithmic Approach

The matching logic follows the assignment-specified weighted scoring system (0–100 scale) to ensure transparent and reproducible results. The total confidence score is derived from two components:

1. Primary Domain Matching (70 points)

The system first evaluates the high-level research_domain field.

  • Exact Match: Awards 70 points.
  • Mismatch: Awards 0 points.

This strict filtering prioritizes matches fundamentally grounded in the same field of study, as per the assignment requirements.

2. Subdomain Similarity (Up to 30 points)

Refines the match based on specific areas of interest (subdomain). The scoring cascades through the following logic:

  1. Exact Match (30 pts): Identical strings (case-insensitive).
  2. Containment (24 pts): One term is a substring of the other (e.g., "Vision" in "Computer Vision").
  3. Predefined Related Subdomains (21 pts): Uses mappings to capture common academic overlaps (e.g., NLP ↔ LLMs, Cryptography ↔ Network Security).
  4. Shared Categories (15 pts): Both subdomains map to a common parent category.
  5. Semantic Similarity (Variable): If no direct relationship is found, the system uses TF-IDF (Term Frequency-Inverse Document Frequency) and Cosine Similarity to calculate a vector distance between terms. This allows detection of related concepts even with different terminology.

Output Format

The system produces a deterministic output file (JSON or CSV) containing the following key fields:

  • mentee_name: Name of the student.
  • matched_mentor: Name of the assigned mentor.
  • confidence_score: A numeric score (0-100) indicating match quality.
  • match_reason: A human-readable explanation of why the match was made.

Example CSV Output

mentee_name,matched_mentor,confidence_score,match_reason
Aanya N,Dr. Sharma,95,"Same domain (AI) and same subdomain (Computer Vision)"

Prerequisites

  • Python 3.10 or higher
  • Node.js 18 or higher (for the optional frontend)
  • npm or yarn

Installation and Execution

Backend API & CLI

The backend handles the core logic.

  1. Navigate to the backend directory:

    cd backend
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Run the CLI (Recommended for evaluation): This allows you to match CSV files directly from the terminal.

    python cli.py --mentees ../sample_mentees.csv --mentors ../sample_mentors.csv --output results.json

    Arguments:

    • --mentees: Path to the mentees CSV file (Required)
    • --mentors: Path to the mentors CSV file (Required)
    • --output: Destination path for the result file (Default: output.json)
    • --format: Output format, either json or csv (Default: json)
  4. (Optional) Run the API Server:

    python main.py

    The API will be available at http://localhost:8000.

Frontend Application (Optional)

The frontend provides a graphical user interface for the system.

  1. Navigate to the root directory (containing package.json).

  2. Install dependencies:

    npm install
  3. Start the development server:

    npm run dev

    Access the application at http://localhost:3000.

API Documentation (Optional)

POST /match

Accepts JSON payloads of mentees and mentors to process matches.

Request Body:

{
  "mentees": [{"name": "Student A", "research_domain": "AI", "subdomain": "NLP", ...}],
  "mentors": [{"name": "Prof B", "research_domain": "AI", "subdomain": "LLMs", ...}]
}

POST /match/csv

Accepts multipart/form-data uploads of CSV files.

Design Decisions

  • Deterministic Logic: The matching service is designed to be deterministic, ensuring that the same input always produces the exact same output.
  • Local Text Analysis: TF-IDF models are fitted locally to the provided batch. This ensures semantic matching works robustly on the specific vocabulary of the input dataset without external dependencies.
  • Type Safety: Pydantic models (Backend) and TypeScript interfaces (Frontend) are strictly enforced to minimize runtime errors and ensure data integrity.

About

A deterministic mentor-mentee matching engine that pairs candidates based on research domain compatibility and semantic subdomain similarity

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors