LLD Implementation Pointers (for Name Server & Storage Server)

1) Name Server (NS) — LLD Pointers

1.1 Process model & directories

Single process.
Single system directory (e.g., /var/nfsys).
NS does not store file contents:
- Files’ bytes live only on Storage Servers (SS).
- NS stores metadata: users, files, ACLs, placements, version hints, plus any in-memory caches it wants.

1.2 Threads, worker model & load balancing

1.2.1 Threads

Acceptor thread:
- Calls accept() on:
  - Client port (7000)
  - SS-Control port (7001)
- For each new connection, chooses a worker and hands over the fd.
Two worker classes, both using epoll:
1. Client-NS workers
  - Handle Client connections on port 7000.
  - Decode client requests (LOGIN/VIEW/INFO/CREATE/DELETE/ACCESS/… and *_TICKET ops).
  - For any request that requires contacting an SS (CREATE/DELETE/FETCH/SYNC, etc.), they enqueue a message to the appropriate NS-SS worker.
2. NS-SS workers
  - Handle persistent SS-Control connections on port 7001.
  - Own SS-Control fds and run epoll on them.
  - Send NS→SS control messages, read SS→NS responses, and forward results back to the originating Client-NS worker.
Once an fd is assigned to a worker, it stays on that worker until closed.

1.2.2 Load balancing (threshold doubling)

Maintain assigned_fds[worker_id] separately for:
- Client-NS worker pool, and
- NS-SS worker pool.
Initialize a threshold T = 2 per pool.
For each newly accepted connection in a pool:
- Scan workers in that pool round-robin and pick one with < T FDs.
- Assign the fd to that worker and increment its count.
- If all workers in that pool have ≥ T FDs, then set T ← 2T for that pool and continue.
Keep a global map fd → worker_id (and fd → pool_type) for debugging/metrics and routing.

1.2.3 Per-thread queues (required for NS↔SS routing)

Each worker has:

InboundQ: queue of “work items” this worker should process.
OutboundQ: queue of “work items” this worker wants some other worker to process.

Queues can be implemented as:

std::queue<T> + std::mutex + std::condition_variable (simple and fine).

Patterns:

When a Client-NS worker needs to call an SS (e.g., CREATE on SS #7):
1. It builds a work item:
  { server_id, target_fd, opcode, protobuf_payload, client_fd, corr_id }.
2. It pushes this onto the InboundQ of the NS-SS worker that owns the SS #7 control fd.
3. The NS-SS worker’s event loop:
  - dequeues the work item,
  - writes the framed message to the SS-Control fd,
  - and later, when a response arrives on that fd, decodes it and pushes a result work item to the originating Client-NS worker’s InboundQ.
When an NS-SS worker has a response for a client:
- It constructs a result item (including client_fd and corr_id) and pushes it to the relevant Client-NS worker’s InboundQ.
- The Client-NS worker then completes the original client request and writes the final response to the client fd.

The queues are entirely inside NS. SS still sees a normal persistent control connection and standard NS⇄SS messages defined in the HLD.

1.3 Connection types & state machines

There are two kinds of external connections:

Client connections (Client⇄NS on port 7000), owned by Client-NS workers.
SS-Control connections (SS⇄NS on port 7001), owned by NS-SS workers.

1.3.1 Client FDs (port 7000)

Per-FD state:
- CONNECTING → LOGGED_IN (user_id bound) → READY → CLOSING
Per-request tracking:
- Maintain a map:
  (fd, CorrId) → { opcode, decode_state, payload_buf, deadline, route }
- route includes any info needed to map responses from SS back to this client request (e.g. which server_id we called, which NS-SS worker we used).
STOP / session timeout logic:
- Each client connection has a login_time.
- If a Client sends F_STOP, or 30 minutes have elapsed since LOGIN on that fd:
  - Reject new requests from that fd (respond with SESSION_EXPIRED / SESSION_TIMEOUT).
  - Allow all tracked (fd, CorrId) to finish normally.
  - Once no in-flight requests remain, close the fd and transition the FD state to CLOSING.

1.3.2 SS-Control FDs (port 7001)

These are persistent control connections from SS to NS, owned by NS-SS workers.
Per-FD state:
- CONNECTING → REGISTERED (after OP_SS_REGISTER) → READY → CLOSING
NS-SS workers:
- Read:
  - OP_SS_REGISTER on first message to bind server_id and endpoints.
  - OP_SS_HEARTBEAT periodically.
  - OP_SS_FILE_UPDATE_NOTIFY, OP_SS_SYNC_RESULT, etc.
- Write:
  - OP_SS_CREATE, OP_SS_DELETE, OP_SS_LIST_FILES, OP_SS_FETCH_FILE, OP_SS_FILE_METADATA_REQUEST, OP_SS_SYNC_REQUEST, etc., based on work items from Client-NS workers.
Failures & timeouts:
- If heartbeats from a given SS stop for longer than allowed, or any fatal socket error occurs:
  - Mark that server_id as unavailable in NS metadata.
  - Close the SS-Control fd and move it to CLOSING.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLD Implementation Pointers (for Name Server & Storage Server)

1) Name Server (NS) — LLD Pointers

1.1 Process model & directories

1.2 Threads, worker model & load balancing

1.2.1 Threads

1.2.2 Load balancing (threshold doubling)

1.2.3 Per-thread queues (required for NS↔SS routing)

1.3 Connection types & state machines

1.3.1 Client FDs (port 7000)

1.3.2 SS-Control FDs (port 7001)

FilesExpand file tree

LLD_pointers.md

Latest commit

History

LLD_pointers.md

File metadata and controls

LLD Implementation Pointers (for Name Server & Storage Server)

1) Name Server (NS) — LLD Pointers

1.1 Process model & directories

1.2 Threads, worker model & load balancing

1.2.1 Threads

1.2.2 Load balancing (threshold doubling)

1.2.3 Per-thread queues (required for NS↔SS routing)

1.3 Connection types & state machines

1.3.1 Client FDs (port 7000)

1.3.2 SS-Control FDs (port 7001)