Skip to content
24 changes: 12 additions & 12 deletions docs/understanding/terminologies/olake.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -94,24 +94,24 @@ OLake supports 4 distinct sync modes:

### 3. Data Filter

The data filter feature allows selective ingestion from source databases by applying SQL-style `WHERE` clauses or BSON-based conditions during ingestion.
The data filter feature allows selective ingestion from source systems by applying filtering conditions before writing to the destination, so only the required subset of data is replicated.

- Ensures only selected data enters the pipeline, saving on transfer, storage, and processing.
- Supports combining up to two conditions with logical operators (AND/OR).
- Operators: `>`, `<`, `=`, `!=`, `>=`, `<=`
- Values can be `numbers`, `quoted strings/timestamps/ids (eg.created_at > \"2025-08-21 17:38:35.017\")`, or `null`.
- Values can be `numbers`, `quoted strings/timestamps/ids (eg.created_at > 2025-08-21 17:38:35.017)`, or `null`.

**Adoption of filter in drivers:**
- **Postgres:** During chunk processing, filters are applied alongside chunk conditions, ensuring only matching records are ingested—even with CTID-based chunking.
- **MySQL:** During chunk processing, filters are applied within each chunk so only relevant rows are returned, even with limit-offset chunking.
- **MongoDB:** During chunk processing, filters are enforced in the aggregation pipeline’s $match stage to ensure only compliant documents are processed.
- **Oracle:** Similar to Postgres and MySQL, filters are applied within each chunk’s scan, guaranteeing only records satisfying conditions are ingested.
- **DB2:** Similar to Postgres and MySQL, filters are applied within each chunk’s scan, guaranteeing only records satisfying conditions are ingested.
:::note
Data filter is supported **only when Normalisation is enabled** for the job.
:::

:::info CDC/Incremental Filter Behavior
- From **OLake connector v0.6.0** and **OLake UI v0.4.1** onward, data filter is now available for **CDC and Incremental sync** as well, and the filter configured during **Full Refresh** will be applied during subsequent **CDC and Incremental** syncs.
- Data filtering for CDC and Incremental is available **only for jobs created on OLake connector v0.6.0 or later**. For jobs created on earlier versions, even if a data filter is configured for Full Refresh, it will not be applied during CDC and Incremental, even if the OLake version is upgraded; new job must be created to use data filtering for CDC and Incremental.
- If you update an existing job’s filter after it has been created and scheduled (for example, changing conditions a few days later), OLake will automatically perform **Clear Destination**, and the next sync will run as a **Full Refresh** that applies the new filter conditions.
- If a job was originally created without any filter and you later add a filter, OLake will again perform **Clear Destination**, and the next sync will be a **Full Refresh** that uses the newly added filter.
Comment thread
merlynm20 marked this conversation as resolved.
:::

:::note
If using DB2 as source, then filter for timestamp should be in the format of `2025-01-01 10:15:30.123456`
:::

<div style={{ textAlign: "center" }}>
<img src="/img/docs/terminologies/data-filter.webp"
alt="Olake Partition output"
Expand Down
Loading