simplepractice · kxzk · May 5, 2026 · May 3, 2026 · May 4, 2026 · May 4, 2026
diff --git a/docs/API_REFERENCE.md b/docs/API_REFERENCE.md
@@ -42,11 +42,12 @@ Block receives a `Langfuse::Config` object with these properties:
 | `timeout`                      | Integer | No       | `5`                            | HTTP timeout (seconds)            |
 | `cache_ttl`                    | Integer | No       | `60`                           | Prompt cache TTL (seconds)        |
 | `cache_max_size`               | Integer | No       | `1000`                         | Max cached prompts                |
-| `cache_backend`                | Symbol  | No       | `:memory`                      | `:memory` or `:rails`             |
+| `cache_backend`                | Symbol  | No       | `:memory`                      | `:memory`, `:rails`, or `:auto`   |
 | `cache_lock_timeout`           | Integer | No       | `10`                           | Lock timeout (seconds)            |
 | `cache_stale_while_revalidate` | Boolean | No       | `false`                        | Advisory SWR intent flag (effective activation depends on `cache_stale_ttl`) |
 | `cache_stale_ttl`              | Integer or `:indefinite` | No | `0`                  | Stale TTL (seconds, `>0` enables SWR) |
 | `cache_refresh_threads`        | Integer | No       | `5`                            | Background refresh threads        |
+| `prompt_cache_observer`        | Callable | No      | `nil`                          | Prompt cache event hook           |
 | `batch_size`                   | Integer | No       | `50`                           | Score + trace export batch size   |
 | `flush_interval`               | Integer | No       | `10`                           | Score + trace export interval (s) |
 | `sample_rate`                  | Float   | No       | `1.0`                          | Trace + trace-linked score sampling rate (`0.0..1.0`) |
@@ -215,7 +216,7 @@ Fetch a prompt from Langfuse (with caching).
 **Signature:**
 
 ```ruby
-get_prompt(name, version: nil, label: nil, fallback: nil, type: nil)
+get_prompt(name, version: nil, label: nil, fallback: nil, type: nil, cache_ttl: nil)
 ```
 
 **Parameters:**
@@ -227,6 +228,7 @@ get_prompt(name, version: nil, label: nil, fallback: nil, type: nil)
 | `label`    | String  | No          | Version label (e.g., "production")                   |
 | `fallback` | String or Array<Hash> | No | Fallback prompt if not found (`String` for text, `Array<Hash>` for chat) |
 | `type`     | Symbol  | Conditional | `:text` or `:chat` (required if `fallback` provided) |
+| `cache_ttl` | Integer | No | Per-call cache TTL override. `0` bypasses cache read/write |
 
 **Returns:** `Langfuse::TextPromptClient` or `Langfuse::ChatPromptClient`
 
@@ -257,14 +259,57 @@ prompt = client.get_prompt("new-prompt",
 
 See [PROMPTS.md](PROMPTS.md) for complete guide.
 
+### `Client#get_prompt_result`
+
+Fetch a prompt and return cache metadata.
+
+**Signature:**
+
+```ruby
+get_prompt_result(name, version: nil, label: nil, fallback: nil, type: nil, cache_ttl: nil)
+```
+
+**Returns:** `Langfuse::PromptFetchResult`
+
+| Attribute | Type | Description |
+| --------- | ---- | ----------- |
+| `prompt` | TextPromptClient or ChatPromptClient | Prompt client |
+| `logical_key` | String | Stable logical identity: name plus version or label/default production |
+| `storage_key` | String | Backend key for the current cache generation |
+| `cache_status` | Symbol | `:hit`, `:miss`, `:stale`, `:refresh`, `:bypass`, or `:disabled` |
+| `source` | Symbol | `:cache`, `:api`, or `:fallback` |
+| `fallback?` | Boolean | Whether fallback content was returned |
+
+```ruby
+result = client.get_prompt_result("greeting", label: "production")
+result.prompt.compile(name: "Ada")
+result.cache_status # => :miss
+```
+
+### Prompt Cache Operations
+
+Flat client APIs for operational prompt cache control:
+
+```ruby
+client.refresh_prompt("greeting", label: "production", cache_ttl: 60)
+client.invalidate_prompt_cache("greeting", label: "production")
+client.invalidate_prompt_cache_by_name("greeting")
+client.clear_prompt_cache
+client.prompt_cache_stats
+client.prompt_cache_key("greeting")
+client.validate_prompt_cache_backend!
+```
+
+`invalidate_prompt_cache_by_name` and `clear_prompt_cache` use generation counters. Rails.cache entries from old generations are not scanned; they become unreachable and expire by TTL.
+
 ### `Client#compile_prompt`
 
 Convenience method: fetch and compile in one call.
 
 **Signature:**
 
 ```ruby
-compile_prompt(name, variables: {}, version: nil, label: nil, fallback: nil, type: nil)
+compile_prompt(name, variables: {}, version: nil, label: nil, fallback: nil, type: nil, cache_ttl: nil)
 ```
 
 **Parameters:**

diff --git a/docs/CACHING.md b/docs/CACHING.md
@@ -7,6 +7,7 @@ For configuration options, see [CONFIGURATION.md](CONFIGURATION.md).
 ## Table of Contents
 
 - [Overview](#overview)
+- [Public Cache Operations](#public-cache-operations)
 - [In-Memory Cache](#in-memory-cache-default)
 - [Rails.cache Backend](#railscache-backend-distributed)
 - [Stale-While-Revalidate (SWR)](#stale-while-revalidate-swr)
@@ -22,7 +23,71 @@ The Langfuse Ruby SDK provides two caching backends to optimize prompt fetching:
 1. **In-Memory Cache** (default) - Thread-safe, local cache with TTL and bounded expiration-ordered eviction
 2. **Rails.cache Backend** - Distributed caching with Redis/Memcached
 
-Both backends support TTL-based expiration and stale-while-revalidate (SWR). Distributed stampede protection via locking is specific to the Rails.cache backend; the in-memory backend mitigates stampedes within a single process using Monitor-based single-flight locks.
+Both backends support TTL-based expiration, stale-while-revalidate (SWR), and logical generation-based invalidation. Distributed stampede protection via locking is specific to the Rails.cache backend; the in-memory backend mitigates stampedes within a single process using Monitor-based single-flight locks.
+
+## Public Cache Operations
+
+`get_prompt` remains the normal prompt-returning API. Use `get_prompt_result` when you need cache metadata for logs, metrics, or operational validation:
+
+```ruby
+result = Langfuse.client.get_prompt_result("greeting", label: "production", cache_ttl: 60)
+
+result.prompt        # TextPromptClient or ChatPromptClient
+result.logical_key   # "greeting:production"
+result.storage_key   # Generated backend key for the current cache generation
+result.cache_status  # :hit, :miss, :stale, :refresh, :bypass, or :disabled
+result.source        # :cache, :api, or :fallback
+result.fallback?     # true when caller-provided fallback content was used
+```
+
+Per-call `cache_ttl` overrides the write TTL for that fetch. Passing `cache_ttl: 0` bypasses the cache read, fetches from the API, and does not retain the result:
+
+```ruby
+fresh = Langfuse.client.get_prompt_result("greeting", cache_ttl: 0)
+fresh.cache_status # => :bypass
+```
+
+Use `refresh_prompt` when you intentionally want to bypass the read path and write the fresh prompt through to cache:
+
+```ruby
+result = Langfuse.client.refresh_prompt("greeting", label: "production")
+result.cache_status # => :refresh
+```
+
+The operational cache APIs are flat on the client:
+
+```ruby
+Langfuse.client.invalidate_prompt_cache("greeting", label: "production")
+Langfuse.client.invalidate_prompt_cache_by_name("greeting")
+Langfuse.client.clear_prompt_cache
+
+key = Langfuse.client.prompt_cache_key("greeting")
+key.logical_key # => "greeting:production"
+key.storage_key # Includes the current global and prompt-name generations
+
+Langfuse.client.prompt_cache_stats
+Langfuse.client.validate_prompt_cache_backend!
+```
+
+Cache identity is prompt name plus version or label. When neither is supplied, the logical identity defaults to the `production` label. Runtime variables never enter the cache key; the SDK caches the managed prompt template and compiles variables afterward.
+
+Name-wide invalidation and whole-cache clear use generation counters. Old Rails.cache entries are not physically scanned or deleted; they become unreachable under the new generated storage keys and expire by TTL.
+
+Automatic mutation invalidation only covers `create_prompt` and `update_prompt` calls made by the current SDK process. Prompt edits made in the Langfuse UI or by other SDKs become visible through TTL expiry, `refresh_prompt`, or explicit invalidation.
+
+### Cache Events
+
+Set `prompt_cache_observer` to receive cache events without binding the SDK to your metric names:
+
+```ruby
+Langfuse.configure do |config|
+  config.prompt_cache_observer = lambda do |event, payload|
+    Rails.logger.info(event: event, prompt: payload[:name], status: payload[:cache_status])
+  end
+end
+```
+
+When `ActiveSupport::Notifications` is loaded, the SDK also instruments `prompt_cache.langfuse`. Event payloads include prompt name, version, label, logical key, storage key, backend, cache status, source, and error details when relevant.
 
 ## In-Memory Cache (Default)
 
@@ -102,6 +167,8 @@ Langfuse.configure do |config|
 end
 ```
 
+Use `config.cache_backend = :auto` only when you want the SDK to choose `:rails` if Rails and `Rails.cache` are present, otherwise `:memory`. The default remains `:memory`.
+
 ### Features
 
 - **Distributed**: Shared cache across all processes and servers
@@ -499,10 +566,13 @@ RUN bundle exec rake langfuse:warm_cache_all
 
 See [CONFIGURATION.md](CONFIGURATION.md) for all cache-related configuration options:
 
-- `cache_backend` - `:memory` or `:rails`
+- `cache_backend` - `:memory`, `:rails`, or `:auto`
 - `cache_ttl` - Time-to-live in seconds
 - `cache_max_size` - Max prompts (in-memory only)
 - `cache_lock_timeout` - Lock timeout (Rails.cache only)
+- `cache_stale_ttl` - Stale serving window; `> 0` enables SWR
+- `cache_refresh_threads` - Background refresh worker count
+- `prompt_cache_observer` - Optional cache event hook
 
 ## Performance Considerations
 
@@ -559,12 +629,11 @@ config.cache_backend = :rails
 ### 2. Enable SWR for Production
 
 ```ruby
-# Development: disabled for predictable behavior
-config.cache_stale_while_revalidate = !Rails.env.development?
-
-# Production: enabled for best performance
 if Rails.env.production?
+  config.cache_stale_while_revalidate = true  # Advisory intent flag
   config.cache_stale_ttl = config.cache_ttl  # Set explicitly (common default)
+else
+  config.cache_stale_ttl = 0  # Disabled for predictable prompt iteration
 end
 ```
 
@@ -588,8 +657,14 @@ bundle exec rake langfuse:warm_cache_all
 ### 5. Monitor Cache Performance
 
 ```ruby
-# Log cache hits/misses
-Rails.logger.info "Fetching prompt: #{name} (cache: #{cache_hit? ? 'HIT' : 'MISS'})"
+config.prompt_cache_observer = lambda do |event, payload|
+  Rails.logger.info(
+    event: event,
+    prompt: payload[:name],
+    status: payload[:cache_status],
+    source: payload[:source]
+  )
+end
 ```
 
 ### 6. Handle Cache Failures Gracefully
@@ -607,9 +682,11 @@ prompt = Langfuse.client.get_prompt(
 
 ```ruby
 # Rails console
-Langfuse.client.api_client.cache&.clear
+Langfuse.client.invalidate_prompt_cache("greeting", label: "production")
+Langfuse.client.invalidate_prompt_cache_by_name("greeting")
+Langfuse.client.clear_prompt_cache
 
-# Or use rake task
+# Or use the rake task
 rake langfuse:clear_cache
 ```
 
@@ -658,8 +735,10 @@ end
 **Solutions**:
 
 1. Wait for TTL to expire
-2. Clear cache manually: `rake langfuse:clear_cache`
-3. Reduce `cache_ttl` in development
+2. Refresh one prompt now: `Langfuse.client.refresh_prompt("greeting")`
+3. Invalidate cached prompt entries: `Langfuse.client.invalidate_prompt_cache_by_name("greeting")`
+4. Clear the prompt cache namespace: `Langfuse.client.clear_prompt_cache`
+5. Reduce `cache_ttl` in development
 
 ### Stampede Protection Not Working
 

diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md
@@ -120,7 +120,7 @@ config.cache_max_size = 5000  # Large prompt library
 
 #### `cache_backend`
 
-- **Type:** Symbol (`:memory` or `:rails`)
+- **Type:** Symbol (`:memory`, `:rails`, or `:auto`)
 - **Default:** `:memory`
 - **Description:** Cache storage backend
 
@@ -130,8 +130,13 @@ config.cache_backend = :memory
 
 # Rails.cache (requires Rails + Redis)
 config.cache_backend = :rails
+
+# Opt in to automatic Rails.cache detection
+config.cache_backend = :auto
 ```
 
+`:auto` chooses `:rails` only when Rails and `Rails.cache` are present; otherwise it falls back to `:memory`. The gem default stays `:memory`.
+
 **Requirements for `:rails` backend:**
 
 - Rails must be defined
@@ -225,6 +230,20 @@ config.cache_refresh_threads = 10  # More threads for high-traffic apps
 
 Only used when SWR is enabled (`cache_stale_ttl > 0`).
 
+#### `prompt_cache_observer`
+
+- **Type:** Callable or `nil`
+- **Default:** `nil`
+- **Description:** Observer hook for prompt cache events
+
+```ruby
+config.prompt_cache_observer = lambda do |event, payload|
+  Rails.logger.info(event: event, prompt: payload[:name], status: payload[:cache_status])
+end
+```
+
+When ActiveSupport is loaded, the SDK also instruments `prompt_cache.langfuse`.
+
 #### `batch_size`
 
 - **Type:** Integer
@@ -599,8 +618,9 @@ Validation rules:
 
 - `public_key` must be present
 - `secret_key` must be present
-- `cache_backend` must be `:memory` or `:rails`
-- If `:rails`, Rails must be defined
+- `cache_backend` must be `:memory`, `:rails`, or `:auto`
+- If `:rails` is selected, or `:auto` resolves to `:rails`, Rails and `Rails.cache` must be available
+- `prompt_cache_observer` must respond to `#call` (if set)
 - `should_export_span` must respond to `#call` (if set)
 - `mask` must respond to `#call` (if set)
 

diff --git a/docs/ERROR_HANDLING.md b/docs/ERROR_HANDLING.md
@@ -50,8 +50,8 @@ end
 **Validation checklist:**
 - `public_key` present and starts with `pk-lf-`
 - `secret_key` present and starts with `sk-lf-`
-- `cache_backend` is `:memory` or `:rails`
-- If `:rails`, Rails is defined
+- `cache_backend` is `:memory`, `:rails`, or `:auto`
+- If `:rails` is selected, or `:auto` resolves to `:rails`, Rails and `Rails.cache` are available
 
 ### `Langfuse::UnauthorizedError`
 
@@ -432,8 +432,9 @@ puts config.inspect
 ### Check Cache State
 
 ```ruby
-cache = Langfuse.client.api_client.cache
-puts "Cache backend: #{cache&.class || 'disabled'}"
+stats = Langfuse.client.prompt_cache_stats
+puts "Cache backend: #{stats[:backend]}"
+puts "Cache enabled: #{stats[:enabled]}"
 ```
 
 ### Test Credentials

diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md
@@ -685,8 +685,8 @@ prompts:
 ```ruby
 # In Rails console
 Langfuse.reset!  # Clears everything
-# Or just clear cache
-Langfuse.client.api_client.cache&.clear
+# Or just clear the prompt cache namespace
+Langfuse.client.clear_prompt_cache
 ```
 
 ### Problem: Variables not substituting correctly

diff --git a/docs/PROMPTS.md b/docs/PROMPTS.md
@@ -553,7 +553,30 @@ Langfuse.configure do |config|
 end
 ```
 
-See [CACHING.md](CACHING.md) for advanced caching strategies (warming, stampede protection).
+Override or bypass cache per call:
+
+```ruby
+prompt = client.get_prompt("greeting", cache_ttl: 300)
+fresh = client.get_prompt_result("greeting", cache_ttl: 0)
+
+fresh.cache_status # => :bypass
+fresh.source       # => :api
+```
+
+Use `get_prompt_result` and the flat cache operations when you need operational visibility:
+
+```ruby
+result = client.get_prompt_result("greeting")
+result.cache_status # => :hit or :miss
+
+client.refresh_prompt("greeting")
+client.invalidate_prompt_cache("greeting")
+client.invalidate_prompt_cache_by_name("greeting")
+client.clear_prompt_cache
+client.prompt_cache_stats
+```
+
+See [CACHING.md](CACHING.md) for advanced caching strategies, generation-based invalidation, cache events, warming, and stampede protection.
 
 ## Combining Prompts with Tracing