Skip to content

network-observability/network-observability-lab

Repository files navigation

Open in GitHub Codespaces

Network Observability Lab

This repository contains the resources for building and managing an observability stack within a network lab environment, specifically designed for the "Modern Network Observability" book. It includes scripts, configuration files, and documentation to set up and operate various observability tools like Prometheus, Grafana Loki, and others, helping you implement and learn about network observability practices in a practical, hands-on manner.

Modern Network Observability

The repository includes all the lab scenarios from the book, which progressively cover topics from metrics and logs collection all the way to leveraging AI for improving observability practices. More specifically:

  • Data Collection Methods (Chapter 3): Learn different ways to gather network data, like using SNMP and gNMI.
  • Metrics and Logs Collection (Chapter 5): Collect important metrics and logs from network devices with tools like Telegraf and Logstash.
  • Data Normalization, Enrichment, and Distribution (Chapter 6): Transform raw data into useful formats and share it across systems.
  • Storage and Querying with PromQL and LogQL (Chapter 7): Store data and use powerful query languages to search through metrics and logs.
  • Visualization (Chapter 8): Create dashboards and reports to make data easy to understand using Grafana.
  • Alerting (Chapter 9): Set up alerts to monitor and quickly respond to network problems.
  • Scripts and Event-Driven Automation with Observability Events and Data (Chapter 12): Automate reports and actions based on data developing scripts, CLI tools and using event-driven systems.
  • AI for Enhanced Observability (Chapter 13): Use AI to predict problems, find anomalies, and improve network management.

Requirements

The lab environments are designed to set up a small network and an attached observability stack. Developed and tested on Debian-based Linux systems. We provide two ways to get started:

  • Linux machine (local or remote): automated setup via an Ansible playbook — see Linux setup.
  • DigitalOcean droplet: fully automated provisioning and configuration — see the setup README.

Both paths install all dependencies (Docker, Containerlab, Python environment) and deploy the lab automatically. The Arista cEOS image is only required for scenarios that use it (e.g. batteries-included). You can download it after registering at arista.com.

Quickstart — Linux

This is the recommended path if you already have a Linux machine (local or remote). It installs all dependencies and deploys the lab in one command.

  1. Clone the repository and enter it:
git clone https://github.com/network-observability/network-observability-lab.git
cd network-observability-lab
  1. Copy and edit the environment file:
cp example.env .env
# Edit .env with your credentials and settings
  1. Install Ansible and run the bootstrap playbook:
pip install ansible
ansible-playbook setup/setup_linux.yml -e "lab_scenario=batteries-included"

For a specific scenario and topology:

ansible-playbook setup/setup_linux.yml \
  -e "lab_scenario=webinar" \
  -e "lab_topology_file=./chapters/webinar/containerlab/lab.yml" \
  -e "lab_vars_file=./chapters/webinar/containerlab/lab_vars.yml"

This installs Docker, Containerlab, the netobs tool, deploys the lab, and loads Nautobot data automatically.

  1. Once setup completes, use netobs for day-to-day operations:
netobs lab deploy --scenario batteries-included
netobs docker logs telegraf-01 --follow
netobs lab destroy --scenario batteries-included

NOTE: Our lab comes with a batteries-included setup, providing you with everything you need to get started with network observability right away. Head over to the instructions section to begin!

Quickstart — DigitalOcean

For fully automated remote provisioning on DigitalOcean, refer to the setup README. This path creates a droplet, installs all dependencies, and deploys the lab automatically using netobs setup deploy.


Managing Lab Environment with netobs

The netobs utility tool simplifies managing and monitoring the network lab and observability stack set up within this repository. It provides a suite of commands designed to streamline various tasks associated with your network infrastructure.

Top-Level Commands

The netobs utility includes five main commands to help manage the environment:

  • netobs setup: Manages the setup of the lab environment. Use netobs setup linux to bootstrap a local Linux machine, or netobs setup deploy to provision and configure a remote DigitalOcean droplet.

  • netobs containerlab: Manages the containerlab pre-configured setup. All lab scenarios presented in the chapters operate under this network lab configuration.

  • netobs docker: Manages the Docker Compose setups for each lab scenario. It ensures the appropriate containers are running for each specific lab exercise.

  • netobs lab: A wrapper utility that combines netobs containerlab and various netobs docker commands to perform major actions. For example:

    • netobs lab purge: Cleans up all running environments.
    • netobs lab prepare --scenario ch7: Purges any scenario that is up and prepares the environment for Chapter 7.
  • netobs utils: Contains utility commands for interacting with the lab environment. This includes scripts for enabling/disabling an interface on a network device to simulate interface flapping and other useful actions.

Example Usage

For instance, the netobs lab deploy command builds and starts a containerlab environment along with the observability stack. This command sets up the entire lab scenario, ensuring that all necessary components are up and running.

# Start the network lab
❯ netobs lab deploy batteries-included --sudo
[21:50:42] Deploying lab environment
           Network create: network-observability
           Running command: docker network create --driver=bridge  --subnet=198.51.100.0/24 network-observability
           Successfully ran: network create
─────────────────────────────────────────────────── End of task: network create ────────────────────────────────────────────────────

           Deploying containerlab topology
           Topology file: containerlab/lab.yml
           Running command: sudo containerlab deploy -t containerlab/lab.yml
INFO[0000] Creating container: "ceos-01"
INFO[0000] Creating container: "ceos-02"
INFO[0001] Creating virtual wire: ceos-01:eth2 <--> ceos-02:eth2
INFO[0001] Creating virtual wire: ceos-01:eth1 <--> ceos-02:eth1
+---+---------+--------------+----------------+------+---------+------------------+--------------+
| # |  Name   | Container ID |     Image      | Kind |  State  |   IPv4 Address   | IPv6 Address |
+---+---------+--------------+----------------+------+---------+------------------+--------------+
| 1 | ceos-01 | d59629fbbdc0 | ceos:4.28.5.1M | ceos | running | 198.51.100.11/24 | N/A          |
| 2 | ceos-02 | 80854bfd7e08 | ceos:4.28.5.1M | ceos | running | 198.51.100.12/24 | N/A          |
+---+---------+--------------+----------------+------+---------+------------------+--------------+
[21:51:14] Successfully ran: Deploying containerlab topology
─────────────────────────────────────────── End of task: Deploying containerlab topology ───────────────────────────────────────────

           Running command: docker compose --project-name netobs -f chapters/docker-compose.yml --verbose up -d --remove-orphans
[+] Building 0.0s (0/0)
[+] Running 10/10
 ✔ Volume "netobs_grafana-01_data"     Created                                                                                 0.0s
 ✔ Volume "netobs_prometheus-01_data"  Created                                                                                 0.0s
 ✔ Container netobs-grafana-01-1       Started                                                                                 0.7s
 ✔ Container netobs-prometheus-01-1    Started                                                                                 1.3s
 ✔ Container netobs-telegraf-02-1      Started                                                                                 1.0s
[21:51:16] Successfully ran: start stack
───────────────────────────────────────────────────── End of task: start stack ─────────────────────────────────────────────────────

Lab Scenarios

The chapters/ folder contains a collection of lab scenarios designed to help you explore modern network observability techniques using open-source tools. These scenarios are directly aligned with the chapters of the book.

Each practical chapter provides two lab scenarios:

  1. Skeleton Scenario (ch<number>): This scenario includes only the bare minimum setup required to follow along with the exercises in the corresponding chapter of the book.
  2. Completed Scenario (ch<number>-completed): This scenario comes fully configured, with all components set up as described in the chapter.

Lab Components Grafana

Overview of Practical Chapters

Here is a brief overview of the practical chapters and the key concepts you will encounter:

  • Chapter 3 - Network Observability Data: This chapter explores various methods and techniques to obtain operational data from network devices using popular Python libraries and other low-level tools. It covers protocols such as SNMP, gNMI, SSH CLI parsing, REST APIs, eBPF and more.

  • Chapter 5 - Data Collectors: Building on the concepts from Chapter 3, this chapter introduces tools like Telegraf and Logstash, which are widely used in production environments to collect metrics and syslog data from network devices.

  • Chapter 6 - Data Distribution and Processing: This chapter delves deeper into configuring Telegraf and Logstash to normalize and enrich the collected data. It also introduces the use of Message Brokers like Kafka for handling data in larger environments, with practical examples included in the lab.

  • Chapter 7 - Data Storage with Prometheus and Loki: This chapter focuses on using Prometheus to scrape, store, and analyze normalized and enriched metrics from Telegraf. It includes practical examples of PromQL queries to extract meaningful insights from your network data. Additionally, the chapter covers Loki for log data storage and retrieval using LogQL, as well as the implementation of recording rules in both systems to optimize query performance and precompute frequent calculations.

  • Chapter 8 - Data Visualization: This chapter centers on Grafana, demonstrating how to create panels and dashboards to visualize the data collected from the network.

  • Chapter 9 - Alerting: This chapter dives into generating alerts with Prometheus and Loki based on the collected data. It introduces Alertmanager, which manages the routing of alerts to different destinations, including integration with Keep for alert and incident management workflows.

  • Chapter 12 - Automation with Observability Data: This chapter delves into leveraging automation tools, such as Prefect, to streamline and automate day 2 operations using your network’s observability data. It highlights how automation can enhance efficiency, reduce manual effort, and improve the reliability of ongoing network management tasks.

  • Chapter 13 - Machine Learning and AI: This chapter explores how machine learning and AI techniques can enhance your observability practices. It covers basic forecasting, AI-driven Root Cause Analysis (RCA), and advanced anomaly detection.

  • Batteries Included Scenario: This scenario brings everything together in a fully configured environment, offering a glimpse into the full potential of these tools. The batteries-included scenario README provides an overview and detailed explanation of the setup, giving you a holistic view of what is achievable with this setup.

About

Reference Lab and Observability architecture that accompanies the book

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors