- version: 3.0.1
- date: 2019-06-05
- authors: Simone Basso
The purpose of this document is to explain how OONI Probe works. We're using version 3.0.0+ because it's intended that version 2.0.0 is described by the existing implementations, and this version is meant to be the reference for upcoming code changes to match this spec.
This document should serve as an introduction for the reader interested in the OONI-verse. We will strive to keep it current, but it will inherently age quicker than more specific specifications. Please, let us know if some parts of this document have become obsolete and we didn't notice.
The probe is the software running network tests (aka nettests). The probe is an app for mobile or desktop. Current implementations are:
-
github.com/ooni/probe-android for Android devices, written in Java;
-
github.com/ooni/probe-ios for iOS devices, written in ObjectiveC;
-
github.com/ooni/probe-legacy for Desktop (legacy implementation), written in Python;
-
github.com/ooni/probe-cli command line interface for Desktop (new implementation);
-
github.com/ooni/probe-desktop graphical user interface for Desktop (new implementation). This is based on probe-cli.
The engine is the piece of code running nettests. A specific implementation of the probe uses an engine. Current implementations are:
-
github.com/measurement-kit/measurement-kit C++ engine used for probe-android, probe-ios, probe-cli;
-
github.com/ooni/probe-engine experimental Go engine containing code that is not practical to write in C++, which will be used by probe-android, probe-ios, probe-cli;
-
github.com/ooni/probe-legacy which contains its own engine written in Python.
The operations discussed here are valid for all implementations.
The orchestra is a set of servers used to provide probes with input for automatic network tests. This is currently experimental.
The geolookup is a set of servers and databases used to discover the probe's IP, ASN (autonomous system number), CC (country code), and network name (name of the entity owning the ASN).
The bouncer is a set of servers used by the probe to discover the collector and the test helper.
The collector is a set of servers to which the probe submits the results of nettests.
The test helpers are a set of servers useful to perform specific nettests. Their specs is available as part of this repository. We only consider test helpers the servers that are under OONI control. As we will see later, there are other servers we don't control that are part of our testing (e.g., when we test a specific URL for censorship, the server being tested is obviously part of the testing process but is also most likely not under our control).
Nettests are either user initiated or automatically initiated when using the orchestra. Interaction (0) describes when the probe communicates with orchestra to get information, such as what test to run and with which input. Users can choose whether to enable orchestra or not. The specific policy for doing that depends on the app. (As of this writing, we have not finished implementing all of orchestra yet).
Discovering the input for the test is also part of orchestra. For example, there is an orchestra endpoint for discovering the list of URLs that needs to be tested when performing Web Connectivity tests. We aim to use this functionality to decide which URLs to test, rather than using static URLs shipped inside of the mobile and desktop apps.
When the test name and its input are known, we can move forward with the following steps.
The engine contacts the bouncer, as shown in interaction (1). This will tell the engine the available collectors and test helpers.
Unless configured to skip this step, the engine will perform a geolookup as shown in interaction (2). The purpose of geolookup is to know the user IP, which by default is not included in the report, and information that can be guessed from the IP, like the ASN, the CC, etc. Knowing the IP also allows the engine to attempt to scrub the IP from the results, when the user has requested the engine not to include their IP address (which is the default). In this document we don't get into the details of our Data Policy, which you can read separately; when in doubt, the Data Policy will always have precedence over this document, which is mainly meant to explain to new developers how all the pieces fit together.
At this point, the engine will contact the collector, interaction (3), to open a report for the specific nettest. This means that the collector will be prepared for receiving and storing the results of the nettest. In code terms, this means the collector will tell the engine the ID of the report, to be used to submit measurements as part of this report.
When the report is open, the engine will perform the nettest. It may or may not use test helpers, depending on the nettest. This is modeled by interaction (4). Depending on the nettest, there will be or will not be inputs, and there will be or not be test helpers. Two examples:
-
If you run Web Connectivity, this will require one or more URLs as input. The engine will access those URLs and use a specific test helper to also access those URLs and do a comparison. The results of comparing the engine and the test helper measurement will become the result of the web measurement;
-
If you run a NDT test, there will be no input and no OONI controlled test helper. However, the test will measure the performance between the engine and a measurement server (which we don't consider a test helper because it is not directly controlled by OONI, but rather is provided by Measurement Lab). The performance measurements will be included in the results.
Nettests that require input produce one measurement for each input. Instead, when there is no input, the nettest produces a single measurement. In this context, a measurement is a JSON document. The specification of the data format used by measurements is described in this repository and every nettest includes its specific pieces of data on top of the general data format.
Measurements produced by nettests are submitted to the OONI collector in the context of the previously openned report. This is again interaction (3), where the ID of the report is used to submit measurements.
Finally, the engine tells the collector to close the report (again interaction 3). This means that the report will not accept further measurements using the previously communicated report ID. This will also trigger the automatic archiving and processing of the measurements. These actions are performed by the OONI pipeline. Data is accessible through the OONI API and browseable using OONI explorer. The OONI sysadmin repository contains the rules that we use to deploy and provision all the servers we control.
