Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 168 additions & 15 deletions docs/deploy-bare-metal.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,16 @@ Directory | Contains
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that Pulsar uses
`logs` | Logs that the installation creates

The `conf` directory contains configuration files for various Pulsar components. Below is a brief overview of the main configuration categories:

- **JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options (`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`, `BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
- **Broker Configuration** (`broker.conf`): Core runtime parameters for the Pulsar Broker, including metadata store connection, cluster name, ports, message retention policies, authentication, and authorization settings.
- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters for BookKeeper Bookies, including journal and ledger directories, ZooKeeper connection, compaction, and disk usage thresholds.
- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings including log levels, output format, file rolling strategies, and log output directories.
- **Dynamic Configuration**: Some Broker configuration properties can be updated at runtime without restarting the service, using the `pulsar-admin` CLI tool or the Admin REST API. Dynamic configurations are stored in the metadata store (ZooKeeper) and take effect across all Brokers in the cluster.

For a complete list of all available configuration properties, see the [Pulsar Configuration Reference](https://pulsar.apache.org/reference/#/next/).

### Install Built-in Connectors (optional)

To use `built-in` connectors, you need to download the connectors tarball release on every broker node in one of the following ways :
Expand Down Expand Up @@ -297,19 +307,116 @@ You can obtain the metadata service URI of the existing BookKeeper cluster by us

[BookKeeper](https://bookkeeper.apache.org) handles all persistent data storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.

### Configure BookKeeper

BookKeeper configuration is split across two files:

- **`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters, including metadata store connection, storage directories, compaction settings, and disk usage thresholds.
- **`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process, including memory allocation (`BOOKIE_MEM`), garbage collection options (`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).

#### Metadata store connection

You can configure BookKeeper bookies using the [`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration file. The most important step in configuring bookies for our purposes here is ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster. The following is an example:

```properties
metadataServiceUri=zk://zk1.us-west.example.com:2181;zk2.us-west.example.com:2181;zk3.us-west.example.com:2181/ledgers
```

Which using `;` as separator in `metadataServiceUri`
:::note

Use `;` as the separator in `metadataServiceUri`.

:::

Once you appropriately modify the `metadataServiceUri` parameter, you can make any other configuration changes that you require. You can find a full listing of the available BookKeeper configuration parameters [here](reference-configuration.md#bookkeeper). However, consulting the [BookKeeper documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a more in-depth guide might be a better choice.
For more information about ZooKeeper and BookKeeper administration, see [ZooKeeper and BookKeeper administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).

#### Storage directories

In a production environment, you should configure dedicated disks for journal and ledger storage. Keeping them on separate disks significantly improves write performance.

```properties
# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
journalDirectory=/data/bookkeeper/journal

# Ledger storage directory — use a separate disk from the journal
ledgerDirectories=/data/bookkeeper/ledgers
```

- `journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a write-ahead log that records every write before it is applied to the ledger storage. Using a dedicated high-speed SSD for the journal directory is critical for write latency.
- `ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where the actual ledger data is stored. Separating it from the journal directory avoids I/O contention and improves throughput.

#### GC and Compaction

BookKeeper writes entries from multiple ledgers into shared Entry Log files (default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted — for example, after Pulsar's retention policy trims expired data — the Entry Log files that contained those ledgers develop unused space. The Bookie's GC thread periodically scans for deleted ledgers and triggers compaction to reclaim disk space by rewriting the remaining valid entries into new files.

BookKeeper provides two levels of compaction:

- **Minor Compaction**: Targets Entry Log files where the valid data ratio is below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at `minorCompactionInterval` (default: every hour). Designed to quickly reclaim heavily fragmented files.
- **Major Compaction**: Targets Entry Log files where the valid data ratio is below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at `majorCompactionInterval` (default: every day). Covers a wider range of files with moderate fragmentation.

```properties
# GC scan interval (ms), default: 900000 (15 min)
gcWaitTime=900000

# Minor Compaction: threshold and interval
minorCompactionThreshold=0.2
minorCompactionInterval=3600

# Major Compaction: threshold and interval
majorCompactionThreshold=0.5
majorCompactionInterval=86400
```

:::note

`minorCompactionInterval` and `majorCompactionInterval` must be greater than `gcWaitTime`, otherwise compaction will not run.

:::

#### Disk usage thresholds

BookKeeper monitors disk usage and can automatically switch a Bookie to read-only mode to prevent disk exhaustion.

```properties
# Bookie enters read-only mode when disk usage exceeds this threshold (default: 0.95)
diskUsageThreshold=0.95

# Warning threshold — Major Compaction is paused when disk usage exceeds this value (default: 0.90)
diskUsageWarnThreshold=0.90

# Low water mark — Bookie returns to read-write mode only after disk usage drops below this value
# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching (recommended: 0.87)
diskUsageLwmThreshold=0.87
```

#### JVM configuration (bkenv.sh)

The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:

- `BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust based on your storage workload. Insufficient heap memory leads to frequent GC, which increases write and read latency — especially under high throughput, GC pauses can cause write timeouts. Direct memory is primarily used for Netty ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory allocator, which allocates all ByteBuf from direct memory for network I/O and internal data handling.

```bash
# Example: increase heap and direct memory for high-throughput workloads
BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g"
```

- `BOOKIE_EXTRA_OPTS`: Passes additional JVM flags to the Bookie process. Examples:

```bash
# Enable heap dump on OOM (the default script only enables ExitOnOutOfMemoryError,
# without a heap dump file you cannot diagnose the root cause)
BOOKIE_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/logs/bookie/heapdump.hprof"

# Temporarily enable Netty leak detection for troubleshooting off-heap memory leaks
# (default is disabled; set to advanced level when investigating)
BOOKIE_EXTRA_OPTS="-Dio.netty.leakDetection.level=advanced"
```

After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you can find a full listing of the available BookKeeper configuration parameters [here](reference-configuration.md#bookkeeper). However, consulting the [BookKeeper documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a more in-depth guide might be a better choice.

Once you apply the desired configuration in `conf/bookkeeper.conf`, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using [nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
### Start BookKeepers

With the desired configuration applied in `conf/bookkeeper.conf` and `conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts. You can start up each bookie either in the background, using [nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.

To start the bookie in the background, use the [`pulsar-daemon`](reference-cli-tools.md) CLI tool:

Expand Down Expand Up @@ -346,6 +453,13 @@ Pulsar brokers are the last thing you need to deploy in your Pulsar cluster. Bro

### Configure Brokers

Broker configuration is split across two files:

- **`conf/broker.conf`**: Contains all Broker runtime parameters, including metadata store connection, cluster name, ports, replication settings, and feature toggles.
- **`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker process, including memory allocation (`PULSAR_MEM`), garbage collection options (`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).

#### Metadata store and cluster settings

You can configure brokers using the `conf/broker.conf` configuration file. The most important element of broker configuration is ensuring that each broker is aware of the ZooKeeper cluster that you have deployed. Ensure that the [`metadataStoreUrl`](reference-configuration.md#broker) and [`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters are correct. In this case, since you only have 1 cluster and no configuration store setup, the `configurationMetadataStoreUrl` point to the same `metadataStoreUrl`.

```properties
Expand All @@ -368,18 +482,57 @@ webServicePort=8080
webServicePortTls=8443
```

> If you deploy Pulsar in a one-node cluster, you should update the replication settings in `conf/broker.conf` to `1`.
>
> ```properties
> # Number of bookies to use when creating a ledger
> managedLedgerDefaultEnsembleSize=1
>
> # Number of copies to store for each message
> managedLedgerDefaultWriteQuorum=1
>
> # Number of guaranteed copies (acks to wait before write is complete)
> managedLedgerDefaultAckQuorum=1
> ```
#### Managed ledger settings

These parameters control how the Broker creates BookKeeper ledgers for message storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers) model:

```properties
# Ensemble size (E): number of bookies to use when creating a ledger (default: 2)
managedLedgerDefaultEnsembleSize=2

# Write quorum (Qw): number of copies to store for each entry (default: 2)
managedLedgerDefaultWriteQuorum=2

# Ack quorum (Qa): number of acks to wait before a write is considered complete (default: 2)
managedLedgerDefaultAckQuorum=2
```

The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.

> If you deploy Pulsar in a one-node cluster, you should set all three values to `1`.

#### JVM configuration (pulsar_env.sh)

The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:

- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust based on your machine's available memory. Insufficient heap memory leads to frequent GC, and GC pauses increase message publish and consume latency — in severe cases, Full GC can make the Broker temporarily unavailable. Direct memory is critical for the Broker's message caching and Netty I/O operations.

```bash
# Example: increase heap and direct memory for production workloads
PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
```

- `PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after other JVM options on the command line, it can also be used to **override** existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup script (later flags take precedence). Examples:

```bash
# Enable heap dump on OOM (the default script only enables ExitOnOutOfMemoryError,
# without a heap dump file you cannot diagnose the root cause after the process exits)
PULSAR_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/logs/pulsar/heapdump.hprof"

# Enable IPv6 support (the default script sets -Djava.net.preferIPv4Stack=true;
# override this if your deployment uses IPv6 networking)
PULSAR_EXTRA_OPTS="-Djava.net.preferIPv4Stack=false"

# Tune Netty memory pool parameters (increase maxOrder and maxCachedBufferCapacity
# if your messages are large, to avoid Netty bypassing the memory pool for allocation)
PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13 -Dio.netty.allocator.numDirectArenas=8 -Dio.netty.allocator.maxCachedBufferCapacity=1048576"
```

:::tip

You can also refer to the default configuration in the [Pulsar Helm Chart values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml) as a tuning reference.

:::


### Enable Pulsar Functions (optional)
Expand Down
Loading
Loading