From fd63cae6f5fe8dd7ca4d74031c0274f9bd4fb4fa Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Fri, 12 Dec 2025 12:12:39 +0100 Subject: [PATCH 1/3] chore: update glossary internal links. --- api/glossary.md | 102 ++++++++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 47 deletions(-) diff --git a/api/glossary.md b/api/glossary.md index 68c7274415..9a1868b54a 100644 --- a/api/glossary.md +++ b/api/glossary.md @@ -15,7 +15,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **ACID**: a set of properties (atomicity, consistency, isolation, durability) that guarantee database transactions are processed reliably. -**ACID compliance**: a set of database properties—Atomicity, Consistency, Isolation, Durability—ensuring reliable and consistent transactions. Inherited from [$PG](#postgresql). +**ACID compliance**: a set of database properties—Atomicity, Consistency, Isolation, Durability—ensuring reliable and consistent transactions. Inherited from [$PG][postgres-link]. **Adaptive query optimization**: dynamic query plan adjustment based on actual execution statistics and data distribution patterns, improving performance over time. @@ -41,7 +41,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Background job**: an automated task that runs in the background without user intervention, typically for maintenance operations like compression or data retention. -**Background worker**: a [$PG](#postgresql) process that runs background tasks independently of client sessions. +**Background worker**: a [$PG][postgres-link] process that runs background tasks independently of client sessions. **Batch processing**: handling data in grouped batches rather than as individual real-time events, often used for historical data processing. @@ -49,13 +49,13 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Backup**: a copy of data stored separately from the original data to protect against data loss, corruption, or system failure. -**Bloom filter**: a probabilistic data structure that tests set membership with possible false positives but no false negatives. [$TIMESCALE_DB](#timescaledb) uses blocked bloom filters to speed up point lookups by eliminating [chunks](#chunk) that don't contain queried values. +**Bloom filter**: a probabilistic data structure that tests set membership with possible false positives but no false negatives. [$TIMESCALE_DB][timescaledb-link] uses blocked bloom filters to speed up point lookups by eliminating [chunks][chunk-link] that don't contain queried values. **Buffer pool**: memory area where frequently accessed data pages are cached to reduce disk I/O operations. -**BRIN (Block Range Index)**: a [$PG](#postgresql) index type that stores summaries about ranges of table blocks, useful for large tables with naturally ordered data. +**BRIN (Block Range Index)**: a [$PG][postgres-link] index type that stores summaries about ranges of table blocks, useful for large tables with naturally ordered data. -**Bytea**: a [$PG](#postgresql) data type for storing binary data as a sequence of bytes. +**Bytea**: a [$PG][postgres-link] data type for storing binary data as a sequence of bytes. ## C @@ -67,7 +67,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**Chunk**: a horizontal partition of a [$HYPERTABLE](#hypertable) that contains data for a specific time interval and space partition. See [chunks][use-hypertables-chunks]. +**Chunk**: a horizontal partition of a [$HYPERTABLE][hypertable-link] that contains data for a specific time interval and space partition. See [chunks][use-hypertables-chunks]. **Chunk interval**: the time period covered by each chunk in a $HYPERTABLE, which affects query performance and storage efficiency. @@ -81,7 +81,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Cloud**: computing services delivered over the internet, including servers, storage, databases, networking, software, analytics, and intelligence. -**Cloud deployment**: the use of public, private, or hybrid cloud infrastructure to host [$TIMESCALE_DB](#timescaledb), enabling elastic scalability and managed services. +**Cloud deployment**: the use of public, private, or hybrid cloud infrastructure to host [$TIMESCALE_DB][timescaledb-link], enabling elastic scalability and managed services. **Cloud-native**: an approach to building applications that leverage cloud infrastructure, scalability, and services like Kubernetes. @@ -89,7 +89,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Columnar**: a data storage format that stores data column by column rather than row by row, optimizing for analytical queries. -**Columnstore**: [$TIMESCALE_DB](#timescaledb)'s columnar storage engine optimized for analytical workloads and [compression](#compression). +**Columnstore**: [$TIMESCALE_DB][timescaledb-link]'s columnar storage engine optimized for analytical workloads and [compression][compression-link]. @@ -169,13 +169,13 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Exactly-once**: a message is delivered and processed precisely once. There is no loss and no duplicates. -**Explain**: a [$PG](#postgresql) command that shows the execution plan for a query, useful for performance analysis. +**Explain**: a [$PG][postgres-link] command that shows the execution plan for a query, useful for performance analysis. **Event sourcing**: an architectural pattern storing all changes as a sequence of events, naturally fitting time-series database capabilities. **Event-driven architecture**: a design pattern where components react to events such as sensor readings, requiring real-time data pipelines and storage. -**Extension**: a [$PG](#postgresql) add-on that extends the database's functionality beyond the core features. +**Extension**: a [$PG][postgres-link] add-on that extends the database's functionality beyond the core features. ## F @@ -183,7 +183,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Failover**: the automatic switching to a backup system, server, or network upon the failure or abnormal termination of the primary system. -**Financial time-series**: high-volume, timestamped datasets like stock market feeds or trade logs, requiring low-latency, scalable databases like [$TIMESCALE_DB](#timescaledb). +**Financial time-series**: high-volume, timestamped datasets like stock market feeds or trade logs, requiring low-latency, scalable databases like [$TIMESCALE_DB][timescaledb-link]. **Foreign key**: a database constraint that establishes a link between data in two tables by referencing the primary key of another table. @@ -191,7 +191,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**Free $SERVICE_SHORT**: a free instance of $CLOUD_LONG with limited resources. You can create up to two free $SERVICE_SHORTs under any $PRICING_PLAN. When a free $SERVICE_SHORT reaches the resource limit, it converts to the read-only state. You can convert a free $SERVICE_SHORT to a [standard one](#standard-tiger-service) under paid $PRICING_PLANs. +**Free $SERVICE_SHORT**: a free instance of $CLOUD_LONG with limited resources. You can create up to two free $SERVICE_SHORTs under any $PRICING_PLAN. When a free $SERVICE_SHORT reaches the resource limit, it converts to the read-only state. You can convert a free $SERVICE_SHORT to a [standard one][standard-tiger-service-link] under paid $PRICING_PLANs. **FTP (File Transfer Protocol)**: a standard network protocol used for transferring files between a client and server on a computer network. @@ -199,13 +199,13 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Gap filling**: a technique for handling missing data points in time-series by interpolation or other methods, often implemented with hyperfunctions. -**GIN (Generalized Inverted Index)**: a [$PG](#postgresql) index type designed for indexing composite values and supporting fast searches. +**GIN (Generalized Inverted Index)**: a [$PG][postgres-link] index type designed for indexing composite values and supporting fast searches. -**GiST (Generalized Search Tree)**: a [$PG](#postgresql) index type that provides a framework for implementing custom index types. +**GiST (Generalized Search Tree)**: a [$PG][postgres-link] index type that provides a framework for implementing custom index types. **GP-LTTB**: an advanced downsampling algorithm that extends Largest-Triangle-Three-Buckets with Gaussian Process modeling. -**GUC (Grand Unified Configuration)**: [$PG](#postgresql)'s configuration parameter system that controls various aspects of database behavior. +**GUC (Grand Unified Configuration)**: [$PG][postgres-link]'s configuration parameter system that controls various aspects of database behavior. **GUID (Globally Unique Identifier)**: a unique identifier used in software applications, typically represented as a 128-bit value. @@ -231,17 +231,17 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Hot storage**: a tier of data storage for frequently accessed data that provides the fastest access times but at higher cost. -**Hypercore**: [$TIMESCALE_DB](#timescaledb)'s hybrid storage engine that seamlessly combines row and column storage for optimal performance. See [Hypercore][use-hypercore]. +**Hypercore**: [$TIMESCALE_DB][timescaledb-link]'s hybrid storage engine that seamlessly combines row and column storage for optimal performance. See [Hypercore][use-hypercore]. -**Hyperfunction**: an SQL function in [$TIMESCALE_DB](#timescaledb) designed for time-series analysis, statistics, and specialized computations. See [Hyperfunctions][use-hyperfunctions]. +**Hyperfunction**: an SQL function in [$TIMESCALE_DB][timescaledb-link] designed for time-series analysis, statistics, and specialized computations. See [Hyperfunctions][use-hyperfunctions]. **HyperLogLog**: a probabilistic data structure used for estimating the cardinality of large datasets with minimal memory usage. -**Hypershift**: a migration tool and strategy for moving data to [$TIMESCALE_DB](#timescaledb) with minimal downtime. +**Hypershift**: a migration tool and strategy for moving data to [$TIMESCALE_DB][timescaledb-link] with minimal downtime. -**Hypertable**: [$TIMESCALE_DB](#timescaledb)'s core abstraction that automatically partitions time-series data for scalability. See [Hypertables][use-hypertables]. +**Hypertable**: [$TIMESCALE_DB][timescaledb-link]'s core abstraction that automatically partitions time-series data for scalability. See [Hypertables][use-hypertables]. ## I @@ -271,7 +271,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Job execution**: the process of running scheduled background tasks or automated procedures. -**JIT (Just-In-Time) compilation**: [$PG](#postgresql) feature that compiles frequently executed query parts for improved performance, available in [$TIMESCALE_DB](#timescaledb). +**JIT (Just-In-Time) compilation**: [$PG][postgres-link] feature that compiles frequently executed query parts for improved performance, available in [$TIMESCALE_DB][timescaledb-link]. **Job history**: a record of past job executions, including their status, duration, and any errors encountered. @@ -289,7 +289,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Load balancer**: a service distributing traffic across servers or database nodes to optimize resource use and avoid single points of failure. -**Log-Structured Merge (LSM) Tree**: a data structure optimized for write-heavy workloads, though [$TIMESCALE_DB](#timescaledb) primarily uses B-tree indexes for balanced read/write performance. +**Log-Structured Merge (LSM) Tree**: a data structure optimized for write-heavy workloads, though [$TIMESCALE_DB][timescaledb-link] primarily uses B-tree indexes for balanced read/write performance. **LlamaIndex**: a framework for building applications with large language models, providing tools for data ingestion and querying. @@ -297,7 +297,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Logical backup**: a backup method that exports data in a human-readable format, allowing for selective restoration. -**Logical replication**: a [$PG](#postgresql) feature that replicates data changes at the logical level rather than the physical level. +**Logical replication**: a [$PG][postgres-link] feature that replicates data changes at the logical level rather than the physical level. **Logging**: the process of recording events, errors, and system activities for monitoring and troubleshooting purposes. @@ -329,7 +329,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **MQTT (Message Queuing Telemetry Transport)**: a lightweight messaging protocol designed for small sensors and mobile devices. -**MST (Managed Service for TimescaleDB)**: a fully managed [$TIMESCALE_DB](#timescaledb) service that handles infrastructure and maintenance tasks. +**MST (Managed Service for TimescaleDB)**: a fully managed [$TIMESCALE_DB][timescaledb-link] service that handles infrastructure and maintenance tasks. ## N @@ -341,7 +341,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Not null**: a database constraint that ensures a column cannot contain empty values. -**Numeric**: a [$PG](#postgresql) data type for storing exact numeric values with user-defined precision. +**Numeric**: a [$PG][postgres-link] data type for storing exact numeric values with user-defined precision. ## O @@ -367,7 +367,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Parallel copy**: a technique for copying large amounts of data using multiple concurrent processes to improve performance. -**Parallel Query Execution**: a [$PG](#postgresql) feature that uses multiple CPU cores to execute single queries faster, inherited by [$TIMESCALE_DB](#timescaledb). +**Parallel Query Execution**: a [$PG][postgres-link] feature that uses multiple CPU cores to execute single queries faster, inherited by [$TIMESCALE_DB][timescaledb-link]. **Partitioning**: the practice of dividing large tables into smaller, more manageable pieces based on certain criteria. @@ -375,19 +375,19 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Performance**: a measure of how efficiently a system operates, often quantified by metrics like throughput, latency, and resource utilization. -**pg_basebackup**: a [$PG](#postgresql) utility for taking base backups of a running [$PG](#postgresql) cluster. +**pg_basebackup**: a [$PG][postgres-link] utility for taking base backups of a running [$PG][postgres-link] cluster. -**pg_dump**: a [$PG](#postgresql) utility for backing up database objects and data in various formats. +**pg_dump**: a [$PG][postgres-link] utility for backing up database objects and data in various formats. -**pg_restore**: a [$PG](#postgresql) utility for restoring databases from backup files created by `pg_dump`. +**pg_restore**: a [$PG][postgres-link] utility for restoring databases from backup files created by `pg_dump`. -**pgVector**: a [$PG](#postgresql) extension that adds vector similarity search capabilities for AI and machine learning applications. See [pgvector][ai-pgvector]. +**pgVector**: a [$PG][postgres-link] extension that adds vector similarity search capabilities for AI and machine learning applications. See [pgvector][ai-pgvector]. -**pgai on $CLOUD_LONG**: a cloud solution for building search, RAG, and AI agents with [$PG](#postgresql). Enables calling AI embedding and generation models directly from the database using SQL. See [pgai][ai-pgai]. +**pgai on $CLOUD_LONG**: a cloud solution for building search, RAG, and AI agents with [$PG][postgres-link]. Enables calling AI embedding and generation models directly from the database using SQL. See [pgai][ai-pgai]. **pgvectorscale**: a performance enhancement for pgvector featuring StreamingDiskANN indexing, binary quantization compression, and label-based filtering. See [pgvectorscale][ai-pgvectorscale]. -**pgvectorizer**: a [$TIMESCALE_DB](#timescaledb) tool for automatically vectorizing and indexing data for similarity search. +**pgvectorizer**: a [$TIMESCALE_DB][timescaledb-link] tool for automatically vectorizing and indexing data for similarity search. **Physical backup**: a backup method that copies the actual database files at the storage level. @@ -401,11 +401,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **$PG**: an open-source object-relational database system known for its reliability, robustness, and performance. -**PostGIS**: a [$PG](#postgresql) extension that adds support for geographic objects and spatial queries. +**PostGIS**: a [$PG][postgres-link] extension that adds support for geographic objects and spatial queries. **Primary key**: a database constraint that uniquely identifies each row in a table. -**psql**: an interactive terminal-based front-end to [$PG](#postgresql) that allows users to type queries interactively. +**psql**: an interactive terminal-based front-end to [$PG][postgres-link] that allows users to type queries interactively. ## Q @@ -435,7 +435,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Real-time analytics**: the immediate analysis of incoming data streams, crucial for observability, trading platforms, and IoT monitoring. -**Real**: a [$PG](#postgresql) data type for storing single-precision floating-point numbers. +**Real**: a [$PG][postgres-link] data type for storing single-precision floating-point numbers. **Real-time aggregate**: a continuous aggregate that includes both materialized historical data and real-time calculations on recent data. @@ -481,11 +481,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Service discovery**: mechanisms allowing applications to dynamically locate services like database endpoints, often used in distributed environments. -**Segmentwise recompression**: a [$TIMESCALE_DB](#timescaledb) [compression](#compression) technique that recompresses data segments to improve [compression](#compression) ratios. +**Segmentwise recompression**: a [$TIMESCALE_DB][timescaledb-link] [compression][compression-link] technique that recompresses data segments to improve [compression][compression-link] ratios. **Serializable**: the highest isolation level that ensures transactions appear to run serially even when executed concurrently. -**Service**: see [$SERVICE_LONG](#tiger-service). +**Service**: see [$SERVICE_LONG][tiger-service-link]. **Sharding**: horizontal partitioning of data across multiple database instances, distributing load and enabling linear scalability. @@ -507,7 +507,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Snapshot**: a point-in-time copy of data that can be used for backup and recovery purposes. -**SP-GiST (Space-Partitioned Generalized Search Tree)**: a [$PG](#postgresql) index type for data structures that naturally partition search spaces. +**SP-GiST (Space-Partitioned Generalized Search Tree)**: a [$PG][postgres-link] index type for data structures that naturally partition search spaces. **Storage optimization**: techniques for reducing storage costs and improving performance through compression, tiering, and efficient data organization. @@ -521,9 +521,9 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**Standard $SERVICE_SHORT**: a regular [$SERVICE_LONG](#tiger-service) that includes the resources and features according to the pricing plan. You can create standard $SERVICE_SHORTs under any of the paid plans. +**Standard $SERVICE_SHORT**: a regular [$SERVICE_LONG][tiger-service-link] that includes the resources and features according to the pricing plan. You can create standard $SERVICE_SHORTs under any of the paid plans. -**Streaming replication**: a [$PG](#postgresql) replication method that continuously sends write-ahead log records to standby servers. +**Streaming replication**: a [$PG][postgres-link] replication method that continuously sends write-ahead log records to standby servers. **Synthetic monitoring**: simulated transactions or probes used to test system health, generating time-series metrics for performance analysis. @@ -531,7 +531,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Table**: a database object that stores data in rows and columns, similar to a spreadsheet. -**Tablespace**: a [$PG](#postgresql) storage structure that defines where database objects are physically stored on disk. +**Tablespace**: a [$PG][postgres-link] storage structure that defines where database objects are physically stored on disk. **TCP (Transmission Control Protocol)**: a connection-oriented protocol that ensures reliable data transmission between applications. @@ -539,19 +539,19 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Telemetry**: the collection of real-time data from systems or devices for monitoring and analysis. -**Text**: a [$PG](#postgresql) data type for storing variable-length character strings. +**Text**: a [$PG][postgres-link] data type for storing variable-length character strings. **Throughput**: a measure of system performance indicating the amount of work performed or data processed per unit of time. **Tiered storage**: a storage strategy that automatically moves data between different storage classes based on access patterns and age. -**$CLOUD_LONG**: $COMPANY's managed cloud platform that provides [$TIMESCALE_DB](#timescaledb) as a fully managed solution with additional features. +**$CLOUD_LONG**: $COMPANY's managed cloud platform that provides [$TIMESCALE_DB][timescaledb-link] as a fully managed solution with additional features. **Tiger Lake**: $COMPANY's service for integrating operational databases with data lake architectures. -**$SERVICE_LONG**: an instance of optimized [$PG](#postgresql) extended with database engine innovations such as [$TIMESCALE_DB](#timescaledb), in a cloud infrastructure that delivers speed without sacrifice. You can create [free $SERVICE_SHORTs](#free-tiger-service) and [standard $SERVICE_SHORTs](#standard-tiger-service). +**$SERVICE_LONG**: an instance of optimized [$PG][postgres-link] extended with database engine innovations such as [$TIMESCALE_DB][timescaledb-link], in a cloud infrastructure that delivers speed without sacrifice. You can create [free $SERVICE_SHORTs][free-tiger-service-link] and [standard $SERVICE_SHORTs][standard-tiger-service-link]. **Time series**: data points indexed and ordered by time, typically representing how values change over time. @@ -563,11 +563,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**$TIMESCALE_DB**: an open-source [$PG](#postgresql) extension for real-time analytics that provides scalability and performance optimizations. +**$TIMESCALE_DB**: an open-source [$PG][postgres-link] extension for real-time analytics that provides scalability and performance optimizations. **Timestamp**: a data type that stores date and time information without timezone data. -**Timestamptz**: a [$PG](#postgresql) data type that stores timestamp with timezone information. +**Timestamptz**: a [$PG][postgres-link] data type that stores timestamp with timezone information. **TLS (Transport Layer Security)**: a cryptographic protocol that provides security for communication over networks. @@ -595,7 +595,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN ## V -**Vacuum**: a [$PG](#postgresql) maintenance operation that reclaims storage and updates database statistics. +**Vacuum**: a [$PG][postgres-link] maintenance operation that reclaims storage and updates database statistics. **Varchar**: a variable-length character data type that can store strings up to a specified maximum length. @@ -613,7 +613,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN ## W -**WAL (Write-Ahead Log)**: [$PG](#postgresql)'s method for ensuring data integrity by writing changes to a log before applying them to data files. +**WAL (Write-Ahead Log)**: [$PG][postgres-link]'s method for ensuring data integrity by writing changes to a log before applying them to data files. **Warm storage**: a storage tier that balances access speed and cost, suitable for data accessed occasionally. @@ -658,3 +658,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN [hyperfunctions-asap-smooth]: /use-timescale/:currentVersion:/hyperfunctions/gapfilling-interpolation/ [hyperfunctions-candlestick-agg]: /use-timescale/:currentVersion:/hyperfunctions/stats-aggs/ [hyperfunctions-stats-agg]: /use-timescale/:currentVersion:/hyperfunctions/stats-aggs/ +[postgres-link]: /api/:currentVersion:/glossary/#postgresql +[timescaledb-link]: /api/:currentVersion:/glossary/#timescaledb +[chunk-link]: /api/:currentVersion:/glossary/#chunk +[hypertable-link]: /api/:currentVersion:/glossary/#hypertable +[compression-link]: /api/:currentVersion:/glossary/#compression +[tiger-service-link]: /api/:currentVersion:/glossary/#tiger-service +[free-tiger-service-link]: /api/:currentVersion:/glossary/#free-tiger-service +[standard-tiger-service-link]: /api/:currentVersion:/glossary/#standard-tiger-service From 49a0456755e374ea87e934595aa27226f3f2d9ad Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Mon, 12 Jan 2026 15:09:15 +0100 Subject: [PATCH 2/3] chore: pg_textsearce v0.3.0. --- use-timescale/extensions/pg-textsearch.md | 33 ++++++++++++++--------- 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 9879aedc00..bbad503a18 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -12,10 +12,11 @@ import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.md # Optimize full text search with BM25 -$PG full-text search at scale consistently hits a wall where performance degrades catastrophically. +$PG full-text search at scale consistently hits a wall where performance degrades catastrophically. $COMPANY's [pg_textsearch][pg_textsearch-github-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, -with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and -provides better search quality and performance than the $PG built-in full-text search. +with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and +provides better search quality and performance than the $PG built-in full-text search. With Block-Max WAND optimization, +`pg_textsearch` delivers up to **4x faster top-k queries** compared to naive BM25 implementations. BM25 scores in `pg_textsearch` are returned as negative values, where lower (more negative) numbers indicate better matches. `pg_textsearch` implements the following: @@ -117,14 +118,16 @@ You have created a BM25 index for full-text search. ## Optimize search queries for performance -Use efficient query patterns to leverage BM25 ranking and optimize search performance. +Use efficient query patterns to leverage BM25 ranking and optimize search performance. The `<@>` operator supports both +simple text queries and explicit index specification with `to_bm25query()`. Use the simple syntax for `ORDER BY` queries, +but `to_bm25query()` is required for `WHERE` clauses, standalone expressions, and inside PL/pgSQL functions. 1. **Perform ranked searches using the distance operator** ```sql - SELECT name, description, description <@> to_bm25query('ergonomic work', 'products_search_idx') as score + SELECT name, description, description <@> 'ergonomic work' as score FROM products ORDER BY score LIMIT 3; @@ -159,11 +162,11 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor 1. **Combine with standard SQL operations** ```sql - SELECT category, name, description <@> to_bm25query('ergonomic', 'products_search_idx') as score + SELECT category, name, description <@> 'ergonomic' as score FROM products WHERE price < 500 AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -0.5 - ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') + ORDER BY score LIMIT 5; ``` @@ -179,7 +182,7 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor ```sql EXPLAIN SELECT * FROM products - ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') + ORDER BY description <@> 'ergonomic' LIMIT 5; ``` @@ -255,9 +258,9 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb ), keyword_search AS ( SELECT id, - ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank + ROW_NUMBER() OVER (ORDER BY content <@> 'query performance') AS rank FROM articles - ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') + ORDER BY content <@> 'query performance' LIMIT 20 ) SELECT a.id, @@ -295,9 +298,9 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb ), keyword_search AS ( SELECT id, - ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank + ROW_NUMBER() OVER (ORDER BY content <@> 'query performance') AS rank FROM articles - ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') + ORDER BY content <@> 'query performance' LIMIT 20 ) SELECT @@ -350,6 +353,12 @@ Customize `pg_textsearch` behavior for your specific use case and data character -- Set default query limit when no LIMIT clause is present (default 1000) SET pg_textsearch.default_limit = 5000; + + -- Enable Block-Max WAND optimization for faster top-k queries (enabled by default) + SET pg_textsearch.enable_bmw = true; + + -- Log block skip statistics for debugging query performance (disabled by default) + SET pg_textsearch.log_bmw_stats = false; ``` From a3e9892618727f60f94e383974caf841ebde87b7 Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Tue, 13 Jan 2026 11:49:05 +0100 Subject: [PATCH 3/3] chore: update for 0.4.0 release note --- use-timescale/extensions/pg-textsearch.md | 31 +++++++++++++++-------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index bbad503a18..e5ed49c289 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -8,6 +8,7 @@ products: [cloud, self_hosted] import EA1125 from "versionContent/_partials/_early_access_11_25.mdx"; import SINCE010 from "versionContent/_partials/_since_0_1_0.mdx"; +import SINCE040 from "versionContent/_partials/_since_0_4_0.mdx"; import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx"; # Optimize full text search with BM25 @@ -16,7 +17,9 @@ $PG full-text search at scale consistently hits a wall where performance degrade $COMPANY's [pg_textsearch][pg_textsearch-github-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and provides better search quality and performance than the $PG built-in full-text search. With Block-Max WAND optimization, -`pg_textsearch` delivers up to **4x faster top-k queries** compared to naive BM25 implementations. +`pg_textsearch` delivers up to **4x faster top-k queries** compared to naive BM25 implementations. Advanced compression +using delta encoding and bitpacking reduces index sizes by **41%** while improving query performance by 10-20% for +shorter queries. BM25 scores in `pg_textsearch` are returned as negative values, where lower (more negative) numbers indicate better matches. `pg_textsearch` implements the following: @@ -118,16 +121,15 @@ You have created a BM25 index for full-text search. ## Optimize search queries for performance -Use efficient query patterns to leverage BM25 ranking and optimize search performance. The `<@>` operator supports both -simple text queries and explicit index specification with `to_bm25query()`. Use the simple syntax for `ORDER BY` queries, -but `to_bm25query()` is required for `WHERE` clauses, standalone expressions, and inside PL/pgSQL functions. +Use efficient query patterns to leverage BM25 ranking and optimize search performance. The `<@>` operator with `to_bm25query()` +provides BM25-based ranking scores. The function takes two parameters: the search query text and the index name. 1. **Perform ranked searches using the distance operator** ```sql - SELECT name, description, description <@> 'ergonomic work' as score + SELECT name, description, description <@> to_bm25query('ergonomic work', 'products_search_idx') as score FROM products ORDER BY score LIMIT 3; @@ -162,7 +164,7 @@ but `to_bm25query()` is required for `WHERE` clauses, standalone expressions, an 1. **Combine with standard SQL operations** ```sql - SELECT category, name, description <@> 'ergonomic' as score + SELECT category, name, description <@> to_bm25query('ergonomic', 'products_search_idx') as score FROM products WHERE price < 500 AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -0.5 @@ -182,7 +184,7 @@ but `to_bm25query()` is required for `WHERE` clauses, standalone expressions, an ```sql EXPLAIN SELECT * FROM products - ORDER BY description <@> 'ergonomic' + ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') LIMIT 5; ``` @@ -258,9 +260,9 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb ), keyword_search AS ( SELECT id, - ROW_NUMBER() OVER (ORDER BY content <@> 'query performance') AS rank + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank FROM articles - ORDER BY content <@> 'query performance' + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') LIMIT 20 ) SELECT a.id, @@ -298,9 +300,9 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb ), keyword_search AS ( SELECT id, - ROW_NUMBER() OVER (ORDER BY content <@> 'query performance') AS rank + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank FROM articles - ORDER BY content <@> 'query performance' + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') LIMIT 20 ) SELECT @@ -362,6 +364,13 @@ Customize `pg_textsearch` behavior for your specific use case and data character ``` + ```sql + -- Enable segment compression using delta encoding and bitpacking (enabled by default) + -- Reduces index size by ~41% with 10-20% query performance improvement for shorter queries + SET pg_textsearch.compress_segments = on; + ``` + + 1. **Configure language-specific text processing** You can create multiple BM25 indexes on the same column with different language configurations: