Fix inference precision. by egeonur · Pull Request #802 · PriorLabs/TabPFN

egeonur · 2026-03-01T21:44:29Z

Issue

#631 fix for this issue
I tried to fix the precision issue.First fix was come from #784 which doesn't add thinking tokens so that single eval pos stays zero and kv cache can be used during prediction. I casted all tensors for the given dtype so results became

no_cache vs repeat: 0.0
no_cache vs fit_preprocessors: 0.0
no_cache vs fit_with_cache: 0.0

only caveat is that when I run script with float32 there were still some inconsistencies like:

no_cache vs repeat: 5.3390077e-06
no_cache vs fit_preprocessors: 5.3390077e-06
no_cache vs fit_with_cache: 5.5486857e-06

but I am guessing it might be related to low precision 64 vs 32.

Also tests/test_consistency.py fails locally but I assume they are stored for future comparisons if sth deviates

Motivation and Context

code to run on local machine for above results:

import random
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
import numpy as np
import torch

from tabpfn import TabPFNRegressor

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

def _set_seeds() -> None:
    torch.manual_seed(0)
    np.random.seed(0)
    random.seed(0)

_set_seeds()
reg = TabPFNRegressor(fit_mode="low_memory", inference_precision=torch.float64)
reg.fit(X_train, y_train)
preds_no_cache = reg.predict(X_test)

reg = TabPFNRegressor(fit_mode="low_memory", inference_precision=torch.float64)
reg.fit(X_train, y_train)
preds_no_cache_repeat = reg.predict(X_test)

_set_seeds()
reg = TabPFNRegressor(fit_mode="fit_preprocessors", inference_precision=torch.float64)
reg.fit(X_train, y_train)
preds_cache_preproc = reg.predict(X_test)

_set_seeds()
reg = TabPFNRegressor(fit_mode="fit_with_cache", inference_precision=torch.float64)
reg.fit(X_train, y_train)
preds_kv_cache = reg.predict(X_test)

def _max_diff(a: np.ndarray, b: np.ndarray) -> float:
    return np.max(np.abs(a - b) / np.abs(a))

print("max relative diffs")
print("no_cache vs no_cache_repeat:", _max_diff(preds_no_cache, preds_no_cache_repeat))
print("no_cache vs cache_preproc:", _max_diff(preds_no_cache, preds_cache_preproc))
print("no_cache vs kv_cache:", _max_diff(preds_no_cache, preds_kv_cache))

Public API Changes

[ X] No Public API changes
Yes, Public API changes (Details below)

How Has This Been Tested?

Tested locally without GPU only on macbook cpu.
Collecting system and dependency information...
PyTorch version: 2.10.0
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.7.3 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.5)
CMake version: version 3.31.1
Libc version: N/A

Python version: 3.11.9 (main, Nov 22 2024, 14:33:40) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime)
Python platform: macOS-15.7.3-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Max

Dependency Versions:

tabpfn: 6.4.1
torch: 2.10.0
numpy: 2.4.2
scipy: 1.17.1
pandas: 2.3.3
scikit-learn: 1.8.0
typing_extensions: 4.15.0
einops: 0.8.2
huggingface-hub: 1.5.0

Checklist

[ X] The changes have been tested locally.
Documentation has been updated (if the public API or usage changes).
A changelog entry has been added (see changelog/README.md), or "no changelog needed" label requested.
[X ] The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

chatgpt-codex-connector · 2026-03-01T21:44:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

CLAassistant · 2026-03-01T21:44:35Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request effectively addresses an inference precision issue by ensuring that the specified inference_precision is respected throughout the InferenceEngineCacheKV. The changes correctly cast tensors to the desired data type, which resolves the inconsistencies noted. Additionally, the modification to conditionally add "thinking tokens" only when not using a KV cache is a logical improvement for consistency. The code is well-structured, and I have a couple of minor suggestions to enhance conciseness.

gemini-code-assist · 2026-03-01T21:45:27Z

src/tabpfn/inference.py

+            inference_dtype = (
+                force_inference_dtype
+                if force_inference_dtype is not None
+                else torch.float32
+            )


This block for determining inference_dtype can be made more concise. Since torch.dtype objects are not falsy, you can use the or operator to simplify this assignment.

inference_dtype = force_inference_dtype or torch.float32

gemini-code-assist · 2026-03-01T21:45:27Z

src/tabpfn/inference.py

+            inference_dtype = (
+                self.force_inference_dtype
+                if self.force_inference_dtype is not None
+                else torch.float32
+            )


This block for determining inference_dtype can be simplified for better readability and conciseness. Using the or operator is a more idiomatic way to provide a default value in this case.

inference_dtype = self.force_inference_dtype or torch.float32

Copilot

Pull request overview

Fixes prediction inconsistencies across fit_modes by aligning inference-time dtype handling and preventing KV-cache inference from adding extra “thinking” tokens that would change context length / cache behavior.

Changes:

Make preprocessing reproducible across repeated predict() calls by overriding preprocessing random state in the on-demand inference engine.
Force model parameters and input tensors to the requested inference_precision dtype for KV-cache inference.
Skip adding thinking tokens during KV-cache prediction (single_eval_pos == 0) to keep cacheable context stable.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/tabpfn/inference.py`	Adjusts preprocessing seeding override and forces inference dtype casting for KV-cache path.
`src/tabpfn/architectures/base/transformer.py`	Avoids adding thinking tokens during KV-cache prediction to preserve cache consistency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T21:48:07Z

src/tabpfn/inference.py

                y_train=self.y_train,
                feature_schema=self.feature_schema,
                parallel_mode="in-order",
-                override_random_state=np.random.default_rng(self.static_seed),
+                override_random_state=self.static_seed,
            )


override_random_state is now passed as an int (self.static_seed). In TabPFNEnsemblePreprocessor.fit_transform_ensemble_members_iterator the random_state is selected via override_random_state or self.random_state, which will ignore an override of 0 (since 0 is falsy) and fall back to self.random_state, reintroducing non-deterministic preprocessing across predict calls. Prefer either passing a truthy override (e.g., a np.random.Generator like before) or (better) changing the downstream selection to override_random_state if override_random_state is not None else self.random_state so that seed 0 is respected.

Copilot · 2026-03-01T21:48:08Z

src/tabpfn/architectures/base/transformer.py

+        is_kv_cache_prediction = (
+            self.cache_trainset_representation and single_eval_pos == 0
+        )
+        if self.add_thinking_tokens is not None and not is_kv_cache_prediction:
            embedded_input, single_eval_pos = self.add_thinking_tokens(


This change alters when thinking tokens are added (they’re skipped for KV-cache prediction when single_eval_pos==0). There’s currently no test covering this specific behavior/contract (e.g., that fit_with_cache prediction path doesn’t append thinking tokens and stays consistent with other fit modes for a fixed seed). Please add/adjust a unit/integration test to lock this in—re-enabling the existing skipped “fit modes return equal results” tests (or adding a targeted regression test for #631) would help prevent regressions.

Fix inference precision.

ca19e16

Copilot AI review requested due to automatic review settings March 1, 2026 21:44

egeonur requested a review from a team as a code owner March 1, 2026 21:44

egeonur requested review from klemens-floege and removed request for a team March 1, 2026 21:44

Copilot started reviewing on behalf of egeonur March 1, 2026 21:44 View session

gemini-code-assist bot reviewed Mar 1, 2026

View reviewed changes

egeonur mentioned this pull request Mar 1, 2026

Fit modes give different results #631

Open

Copilot AI reviewed Mar 1, 2026

View reviewed changes

klemens-floege requested review from magnus-priorlabs March 2, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inference precision.#802

Fix inference precision.#802
egeonur wants to merge 1 commit intoPriorLabs:mainfrom
egeonur:ege/fix-fit-mode

egeonur commented Mar 1, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 1, 2026

Uh oh!

CLAassistant commented Mar 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 1, 2026

Uh oh!

gemini-code-assist bot Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

egeonur commented Mar 1, 2026

Issue

Motivation and Context

Public API Changes

How Has This Been Tested?

Dependency Versions:

Checklist

Uh oh!

chatgpt-codex-connector bot commented Mar 1, 2026

Uh oh!

CLAassistant commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 1, 2026 •

edited

Loading