An NFS-style distributed file-synchronisation system in C: an nfs_manager orchestrates copies across multiple machines, talking to per-host nfs_client agents over a custom PUSH/PULL TCP protocol. Multi-threaded throughout — thread pool on the manager, per-connection threads on the clients.
Second homework of the System Programming course at the University of Athens (Department of Informatics & Telecommunications). Solo project. Tested by SSH'ing into actual lab machines and running real cross-host sync jobs.
The manager reads source→destination pairs from a config file (each side specified as host:port:/path), then keeps the destination side in sync with the source by issuing PULL requests to the source client and PUSH requests to the destination client.
┌─────────────────────────────────────┐
│ nfs_manager │
│ (thread pool, central) │
│ │
│ ┌──────────────────┐ │
│ │ TaskQueue │ mutex+cond │
│ │ (file copies) │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌────────┴─────────┐ │
│ │ worker threads │ │
│ │ fetch_task → │ │
│ │ complete_task │ │
│ └──────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ PULL req PUSH req │
└───────┼────────────┼────────────────┘
│ │
┌────────────┘ └────────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ nfs_client │ │ nfs_client │
│ on host A │ │ on host B │
│ (multi-thread,│ │ (multi-thread,│
│ MAX_CLIENTS) │ │ MAX_CLIENTS) │
└────────────────┘ └────────────────┘
Console commands: add <source-spec> <dest-spec>, cancel <source-spec>, shutdown.
The biggest design call vs. HW1's process pool: threads, not forks, because the work is now I/O-bound on remote sockets rather than filesystem-bound. The thread-pool implementation lives inside TaskQueue itself — same module owns:
- The buffer of pending tasks
- The mutex protecting the buffer
- Two condition variables (one for "queue not full", one for "queue not empty") so worker threads sleep cleanly instead of spinning
- The worker-thread set itself
When you create a TaskQueue you pass it a complete_task function pointer. That function is the work each worker executes per task. For this project, complete_task lives in nfs_manager.c (it needs access to manager-internal helpers like read_line), so the queue stays generic but the work is project-specific.
PUSH (manager → destination client) sends file chunks. Two design options for connection handling:
- (a) Open a fresh TCP connection per PUSH
- (b) Open one connection per (source, destination) pair, hold it open across many PUSHes
I went with (b). The chunked-protocol design only makes sense if you reuse the connection — otherwise the chunk size is essentially irrelevant. Side benefit: logs show one PUSH log line per file with total bytes, instead of N noisy entries.
PULL (manager → source client) is treated as failed if the corresponding PUSH fails — once the destination side has rejected the data, there's no point continuing to fetch.
Each nfs_client accepts multiple inbound connections concurrently — one thread per connection, up to MAX_CLIENTS (defined in Configuration.h). Past that, new connections wait. This was important for the cross-machine tests: a single client can be both a source for one pair and a destination for another, simultaneously.
The manager starts processing tasks during initialization (not after), because the TaskQueue has a max size. If we waited until all initial tasks were enqueued before kicking off workers, the queue could fill and we'd deadlock. Instead the main thread enqueues work while worker threads pull from the same queue — the standard producer-consumer pattern.
Console commands during the main loop:
add— same path as initialization: open LIST against the source, enqueue file copiescancel— drain queued tasks for that source, but don't interrupt an in-flight task (a worker that's mid-copy completes; the queue just stops feeding it more)shutdown— graceful drain: stop accepting new commands, finish queued tasks, exit
Initial submission required IP addresses (e.g., 195.134.65.78). The current code does proper hostname resolution, so linux07.di.uoa.gr works.
| File | Purpose |
|---|---|
src/nfs_manager.c |
Event loop, console command handling, complete_task logic |
src/nfs_console.c |
CLI frontend (add, cancel, shutdown) |
src/nfs_client.c |
Per-host file server: handles LIST, PULL, PUSH requests |
src/modules/TaskQueue.c |
Generic thread pool + bounded task queue (mutex + condvars) |
src/modules/DirList.c |
List of watched directory pairs (carried over from HW1) |
includes/Configuration.h |
MAX_CLIENTS, chunk size, port defaults |
make # produces nfs_manager, nfs_console, nfs_client
# on each host:
./nfs_client -p 9000 -l client_log.txt &
# on the orchestrator:
./nfs_manager -c config_file -l manager_log.txt -n 4 -b 100 &
./nfs_console -l console_log.txtA real run from the UoA DI lab machines (console_example.png):
The first two commands in that screenshot demonstrate error handling — invalid command, then an add with the wrong port that fails the sync with a client timeout. The directory entry still gets added to DirList, but no copies are scheduled.
Part of a two-piece System Programming arc:
- syspro-hw1 — single-machine, process-pool,
inotify-driven - syspro-hw2 (you are here) — distributed, thread-pool, custom TCP protocol. Reuses HW1's
TaskQueueandDirListmodules.
MIT — applies to my own code in this repo. Assignment-distributed materials retain their original course copyright.
