Skip to content

uttam282005/mapreduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce in Go

This project is a simplified MapReduce framework implemented in Go, inspired by the classic MapReduce: Simplified Data Processing on Large Clusters (Google) paper. It showcases how large-scale distributed computation can be achieved by dividing tasks into Map and Reduce phases, coordinated by a central Coordinator and executed by multiple parallel Worker processes.


Features

  • ✅ Coordinator & Worker architecture with RPC communication
  • ✅ Concurrent job scheduling and monitoring
  • ✅ Fault-tolerant re-assignment of tasks (if worker fails or times out)
  • ✅ Word count example using custom mapF and reduceF functions
  • ✅ Threaded design: Coordinator and Worker run concurrently
  • ✅ Modular code — easily plug in new Map/Reduce logic

Project Structure

mapreduce/
│
├── main.go              # Entry point for Coordinator or Worker
│
├── mr/
│   ├── coordinator.go   # Coordinator: assigns jobs & tracks progress
│   ├── worker.go        # Worker: executes map/reduce tasks
│   ├── rpc.go           # Common RPC definitions & utilities
│
├── input/               # Input files for word count
└── output/              # Output files generated by reduce tasks

⚙️ How It Works

1. Coordinator

  • Takes input files and splits them into Map tasks.
  • Assigns tasks to available Workers through RPC calls.
  • Tracks completion and transitions to Reduce phase.
  • Calls Done() periodically to check if all jobs are finished.

2. Worker

  • Connects to the coordinator via RPC.
  • Requests work (RequestTask), executes map or reduce using user-defined functions.
  • Writes intermediate files (mr-X-Y) for map outputs and merged files for reduce results.

3. Map & Reduce Functions

Defined in main.go:

func mapF(filename string, contents string) []mr.KeyValue {
    // Splits text by whitespace and emits (word, "1") pairs
}

func reduceF(key string, values []string) string {
    // Returns the count of occurrences for each key
}

Running the Example

1️⃣ Build the Project

go build -o mapreduce .

2️⃣ Start the Coordinator

Run the coordinator with input files:

go run main.go coordinator -nreduce=3 input/pg-being_ernest.txt input/pg-grimm.txt input/pg-huckleberry.txt

This starts the Coordinator in a goroutine and monitors job completion via Done().

3️⃣ Start Workers

In separate terminals (or as background processes):

go run main.go worker

Each worker connects to the coordinator and starts processing map/reduce tasks.

4️⃣ Wait for Completion

Once all tasks are finished:

Coordinator: all tasks finished
Coordinator exited cleanly

Output files will appear in your working directory (e.g., mr-out-0, mr-out-1, ...).


Example Output

cat mr-out-0
the 120
and 95
to 80
...

Concurrency Model

  • Coordinator runs in the main process with a background goroutine checking Done().
  • Worker runs in its own goroutine to handle tasks concurrently.
  • Communication happens via Go’s RPC system.
  • The main thread sleeps periodically and exits only when all jobs are marked done.

Configuration Options

Flag Description Default
-nreduce Number of reduce tasks 3
Input Files List of files for map phase Required

Extending the Framework

You can implement your own computation by modifying mapF and reduceF. For example:

  • Word count → counting unique words
  • Inverted index → mapping words to document lists
  • Log analysis → grouping by IP or time range

Each new computation only requires changing the Map and Reduce logic — the coordinator and worker logic remains untouched.


References


Author

Uttam Raj B.Tech in Computational Mathematics @ NIT Agartala Passionate about systems programming, distributed systems, and Go.

About

A simplified implementation of a mapreduce system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages