GitHub - uttam282005/mapreduce: A simplified implementation of a mapreduce system.

MapReduce in Go

This project is a simplified MapReduce framework implemented in Go, inspired by the classic MapReduce: Simplified Data Processing on Large Clusters (Google) paper. It showcases how large-scale distributed computation can be achieved by dividing tasks into Map and Reduce phases, coordinated by a central Coordinator and executed by multiple parallel Worker processes.

Features

✅ Coordinator & Worker architecture with RPC communication
✅ Concurrent job scheduling and monitoring
✅ Fault-tolerant re-assignment of tasks (if worker fails or times out)
✅ Word count example using custom mapF and reduceF functions
✅ Threaded design: Coordinator and Worker run concurrently
✅ Modular code — easily plug in new Map/Reduce logic

Project Structure

mapreduce/
│
├── main.go              # Entry point for Coordinator or Worker
│
├── mr/
│   ├── coordinator.go   # Coordinator: assigns jobs & tracks progress
│   ├── worker.go        # Worker: executes map/reduce tasks
│   ├── rpc.go           # Common RPC definitions & utilities
│
├── input/               # Input files for word count
└── output/              # Output files generated by reduce tasks

⚙️ How It Works

1. Coordinator

Takes input files and splits them into Map tasks.
Assigns tasks to available Workers through RPC calls.
Tracks completion and transitions to Reduce phase.
Calls Done() periodically to check if all jobs are finished.

2. Worker

Connects to the coordinator via RPC.
Requests work (RequestTask), executes map or reduce using user-defined functions.
Writes intermediate files (mr-X-Y) for map outputs and merged files for reduce results.

3. Map & Reduce Functions

Defined in main.go:

func mapF(filename string, contents string) []mr.KeyValue {
    // Splits text by whitespace and emits (word, "1") pairs
}

func reduceF(key string, values []string) string {
    // Returns the count of occurrences for each key
}

Running the Example

1️⃣ Build the Project

go build -o mapreduce .

2️⃣ Start the Coordinator

Run the coordinator with input files:

go run main.go coordinator -nreduce=3 input/pg-being_ernest.txt input/pg-grimm.txt input/pg-huckleberry.txt

This starts the Coordinator in a goroutine and monitors job completion via Done().

3️⃣ Start Workers

In separate terminals (or as background processes):

go run main.go worker

Each worker connects to the coordinator and starts processing map/reduce tasks.

4️⃣ Wait for Completion

Once all tasks are finished:

Coordinator: all tasks finished
Coordinator exited cleanly

Output files will appear in your working directory (e.g., mr-out-0, mr-out-1, ...).

Example Output

cat mr-out-0
the 120
and 95
to 80
...

Concurrency Model

Coordinator runs in the main process with a background goroutine checking Done().
Worker runs in its own goroutine to handle tasks concurrently.
Communication happens via Go’s RPC system.
The main thread sleeps periodically and exits only when all jobs are marked done.

Configuration Options

Flag	Description	Default
`-nreduce`	Number of reduce tasks	3
Input Files	List of files for map phase	Required

Extending the Framework

You can implement your own computation by modifying mapF and reduceF. For example:

Word count → counting unique words
Inverted index → mapping words to document lists
Log analysis → grouping by IP or time range

Each new computation only requires changing the Map and Reduce logic — the coordinator and worker logic remains untouched.

References

MapReduce: Simplified Data Processing on Large Clusters (Google)
MIT 6.824: Distributed Systems Labs
Go Official Documentation: https://golang.org/doc/

Author

Uttam Raj B.Tech in Computational Mathematics @ NIT Agartala Passionate about systems programming, distributed systems, and Go.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
input		input
mr		mr
output		output
README.md		README.md
go.mod		go.mod
main.go		main.go
mr-0-0		mr-0-0
mr-0-1		mr-0-1
mr-1-0		mr-1-0
mr-1-1		mr-1-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MapReduce in Go

Features

Project Structure

⚙️ How It Works

1. Coordinator

2. Worker

3. Map & Reduce Functions

Running the Example

1️⃣ Build the Project

2️⃣ Start the Coordinator

3️⃣ Start Workers

4️⃣ Wait for Completion

Example Output

Concurrency Model

Configuration Options

Extending the Framework

References

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

uttam282005/mapreduce

Folders and files

Latest commit

History

Repository files navigation

MapReduce in Go

Features

Project Structure

⚙️ How It Works

1. Coordinator

2. Worker

3. Map & Reduce Functions

Running the Example

1️⃣ Build the Project

2️⃣ Start the Coordinator

3️⃣ Start Workers

4️⃣ Wait for Completion

Example Output

Concurrency Model

Configuration Options

Extending the Framework

References

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages