Describe the bug
Hi, thanks for open-sourcing this great project!
We recently tried running the ranking task on the KuaiRand-1K dataset, but observed a significant inconsistency between training-time evaluation results and standalone inference results.
Steps/Code to reproduce bug
Training Command
PYTHONPATH=${PYTHONPATH}:$(realpath ../) \
torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 \
./training/pretrain_gr_ranking.py \
--gin-config-file ./training/configs/kuairand_1k_ranking.gin
Inference Command
PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 \
--master_addr localhost --master_port 6000 ./inference/inference_gr_ranking.py \
--gin_config_file ./inference/configs/kuairand_1k_inference_ranking.gin \
--checkpoint_dir ckpts/iter550/ --mode eval
Expected behavior
Training-time Evaluation Results at Iteration 550:
Metrics.task0.AUC: 0.703728
Metrics.task1.AUC: 0.532176
Metrics.task2.AUC: 0.664358
Metrics.task3.AUC: 0.474177
Metrics.task4.AUC: 0.644220
Metrics.task5.AUC: 0.377520
Metrics.task6.AUC: 0.691218
Metrics.task7.AUC: 0.558680
Inference Results Using the Saved Checkpoint:
Metrics.task0.AUC: 0.350148
Metrics.task1.AUC: 0.637459
Metrics.task2.AUC: 0.331639
Metrics.task3.AUC: 0.659130
Metrics.task4.AUC: 0.430801
Metrics.task5.AUC: 0.649176
Metrics.task6.AUC: 0.329165
Metrics.task7.AUC: 0.493231
Environment details (please complete the following information):
- Environment location: [Docker]
- Method of recsys-examples install: [Docker]
docker pull shijieliu01/recsys-examples:2026.1.9 for training and docker pull shijieliu01/recsys-examples:inference.2026.1.14 for inference.
docker run --gpus all -it --name gr_training shijieliu01/recsys-examples:2026.1.9 and docker run --gpus all -it --name gr_inference shijieliu01/recsys-examples:inference.2026.1.14
- Run
print_env.sh from the project root and paste the results here:
<details><summary>Click here to see environment details</summary><pre>
**git***
Not inside a git repository
***OS Information***
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS"
PRETTY_NAME="Ubuntu 24.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.2 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
Linux ee8f456bcbfa 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
***GPU Information***
Thu Mar 26 11:58:20 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.20 Driver Version: 580.126.20 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:13:00.0 Off | 0 |
| N/A 43C P0 250W / 400W | 68333MiB / 81920MiB | 98% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:1C:00.0 Off | 0 |
| N/A 22C P0 62W / 400W | 4MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 45 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7443 24-Core Processor
CPU family: 25
Model: 1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 32
Stepping: 1
BogoMIPS: 5689.31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext invpcid_single ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero wbnoinvd arat umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 2 MiB (64 instances)
L1i cache: 2 MiB (64 instances)
L2 cache: 32 MiB (64 instances)
L3 cache: 1 GiB (32 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-63
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
***CMake***
/usr/local/bin/cmake
cmake version 3.31.6
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0
***Python***
/usr/bin/python
Python 3.12.3
***Environment Variables***
PATH : /usr/local/lib/python3.12/dist-packages/torch_tensorrt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/mpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/amazon/efa/bin:/opt/tensorrt/bin
LD_LIBRARY_PATH : /usr/local/lib/python3.12/dist-packages/torch/lib:/usr/local/lib/python3.12/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NUMBAPRO_NVVM :
NUMBAPRO_LIBDEVICE :
CONDA_PREFIX :
PYTHON_PATH :
conda not found
***pip packages***
/usr/local/bin/pip
Package Version Editable project location
-------------------------- ----------------------------- ---------------------------
absl-py 2.3.0
aiohappyeyeballs 2.6.1
aiohttp 3.12.7
aiosignal 1.3.2
annotated-types 0.7.0
anyio 4.9.0
apex 0.1
argon2-cffi 25.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asciitree 0.3.3
asttokens 3.0.0
astunparse 1.6.3
async-lru 2.0.5
attrs 25.3.0
audioread 3.0.1
babel 2.17.0
beautifulsoup4 4.13.4
black 25.1.0
bleach 6.2.0
blis 0.7.11
cachetools 6.0.0
catalogue 2.0.10
certifi 2025.4.26
cffi 1.17.1
cfgv 3.5.0
charset-normalizer 3.4.2
click 8.2.1
cloudpathlib 0.21.1
cloudpickle 3.1.1
cmake 3.31.6
comm 0.2.2
confection 0.1.5
contourpy 1.3.2
cuda-bindings 12.9.0
cuda-python 12.9.0
cudf 25.4.0
cudf-polars 25.4.0
cugraph 25.4.0
cugraph-service-client 25.4.0
cugraph-service-server 25.4.0
cuml 25.4.0
cupy-cuda12x 13.3.0
cuvs 25.4.0
cycler 0.12.1
cymem 2.0.11
Cython 3.1.1
dask 2025.2.0
dask-cuda 25.4.0
dask-cudf 25.4.0
debugpy 1.8.14
decorator 5.2.1
defusedxml 0.7.1
dill 0.4.0
distlib 0.4.0
distributed 2025.2.0
distributed-ucxx 0.43.0
distro 1.9.0
dm-tree 0.1.9
docker 7.1.0
docstring_parser 0.17.0
dynamicemb 0.0.1+961ac1e
einops 0.8.1
execnet 2.1.1
executing 2.2.0
expecttest 0.3.0
fasteners 0.19
fastjsonschema 2.21.1
fastrlock 0.8.3
fbgemm_gpu_nightly 2026.1.8
filelock 3.20.2
flash_attn 2.7.4.post1
fonttools 4.58.1
fqdn 1.5.1
frozenlist 1.6.0
fsspec 2025.5.1
gast 0.6.0
gin-config 0.5.0
grpcio 1.62.1
h11 0.16.0
hstu_attn 0.1.0+961ac1e.cu12.9
hstu_cuda_ops 0.0.0
hstu-hopper 0.1.1+961ac1e.cu12.9
httpcore 1.0.9
httpx 0.28.1
hypothesis 6.130.8
identify 2.6.15
idna 3.10
importlib_metadata 8.7.0
iniconfig 2.1.0
intel-openmp 2021.4.0
iopath 0.1.10
ipykernel 6.29.5
ipython 9.3.0
ipython_pygments_lexers 1.1.1
isoduration 20.11.0
isort 6.0.1
jedi 0.19.2
Jinja2 3.1.6
joblib 1.5.1
json5 0.12.0
jsonpointer 3.0.0
jsonschema 4.24.0
jsonschema-specifications 2025.4.1
jupyter_client 8.6.3
jupyter_core 5.8.1
jupyter-events 0.12.0
jupyter-lsp 2.2.5
jupyter_server 2.16.0
jupyter_server_terminals 0.5.3
jupyterlab 4.4.3
jupyterlab_code_formatter 3.0.2
jupyterlab_pygments 0.3.0
jupyterlab_server 2.27.3
jupyterlab_tensorboard_pro 4.0.0
jupytext 1.17.2
kiwisolver 1.4.8
kvikio 25.4.0
langcodes 3.5.0
language_data 1.3.0
lazy_loader 0.4
libcudf 25.4.0
libcugraph 25.4.0
libcuml 25.4.0
libcuvs 25.4.0
libkvikio 25.4.0
libraft 25.4.0
librmm 25.4.0
librmm-cu12 25.4.0
librosa 0.11.0
libucxx 0.43.0
lightning-thunder 0.2.3.dev0
lightning-utilities 0.14.3
lintrunner 0.12.7
llvmlite 0.42.0
locket 1.0.0
looseversion 1.3.0
marisa-trie 1.2.1
Markdown 3.8
markdown-it-py 3.0.0
MarkupSafe 3.0.2
matplotlib 3.10.3
matplotlib-inline 0.1.7
mdit-py-plugins 0.4.2
mdurl 0.1.2
megatron-core 0.12.1 /workspace/deps/megatron-lm
mistune 3.1.3
mkl 2021.1.1
mkl-devel 2021.1.1
mkl-include 2021.1.1
mock 5.2.0
mpmath 1.3.0
msgpack 1.1.0
multidict 6.4.4
murmurhash 1.0.13
mypy_extensions 1.1.0
nbclient 0.10.2
nbconvert 7.16.6
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.5
ninja 1.11.1.4
nodeenv 1.10.0
notebook 7.4.3
notebook_shim 0.2.4
numba 0.59.1
numba-cuda 0.4.0
numcodecs 0.13.1
numpy 1.26.4
nvdlfw_inspect 0.1.0
nvfuser 0.2.27a0+9bf5aca
nvidia-cudnn-frontend 1.12.0
nvidia-cutlass-dsl 4.3.0
nvidia-dali-cuda120 1.50.0
nvidia-ml-py 12.575.51
nvidia-modelopt 0.29.0
nvidia-modelopt-core 0.29.0
nvidia-nvcomp-cu12 4.2.0.14
nvidia-nvimgcodec-cu12 0.5.0.13
nvidia-nvjpeg-cu12 12.4.0.16
nvidia-nvjpeg2k-cu12 0.8.1.40
nvidia-nvtiff-cu12 0.5.0.67
nvidia-resiliency-ext 0.4.0
nvtx 0.2.11
nx-cugraph 25.4.0
onnx 1.17.0
opt_einsum 3.4.0
optree 0.16.0
ordered-set 4.1.0
orjson 3.11.5
overrides 7.7.0
packaging 25.0
pandas 2.2.3
pandocfilters 1.5.1
parso 0.8.4
partd 1.4.2
pathspec 0.12.1
pexpect 4.9.0
pillow 11.2.1
pip 25.1.1
platformdirs 4.3.8
pluggy 1.6.0
ply 3.11
polars 1.25.2
polygraphy 0.49.20
pooch 1.8.2
portalocker 3.2.0
pre_commit 4.5.1
preshed 3.0.10
prometheus_client 0.22.1
prompt_toolkit 3.0.51
propcache 0.3.1
protobuf 4.24.4
psutil 7.0.0
ptyprocess 0.7.0
PuLP 3.2.1
pure_eval 0.2.3
pyarrow 19.0.1
pybind11 2.13.6
pybind11_global 2.13.6
pycocotools 2.0+nv0.8.1
pycparser 2.22
pydantic 2.11.5
pydantic_core 2.33.2
Pygments 2.19.1
pylibcudf 25.4.0
pylibcugraph 25.4.0
pylibcugraphops 25.4.0
pylibraft 25.4.0
pylibwholegraph 25.4.0
pynvjitlink 0.3.0
pynvml 12.0.0
pyparsing 3.2.3
pyre-extensions 0.0.32
pytest 8.1.1
pytest-flakefinder 1.1.0
pytest-rerunfailures 15.1
pytest-shard 0.1.2
pytest-xdist 3.7.0
python-dateutil 2.9.0.post0
python-hostlist 2.2.1
python-json-logger 3.3.0
pytorch-triton 3.3.0+git96316ce52.nvinternal
pytz 2023.4
pyvers 0.1.0
PyYAML 6.0.2
pyzmq 26.4.0
raft-dask 25.4.0
rapids-dask-dependency 25.4.0a0
rapids-logger 0.1.18
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 14.0.0
rmm 25.4.0
rpds-py 0.25.1
safetensors 0.5.3
scikit-build 0.18.1
scikit-learn 1.6.1
scipy 1.15.3
Send2Trash 1.8.3
setuptools 78.1.1
setuptools-git-versioning 2.1.0
shellingham 1.5.4
six 1.16.0
smart-open 7.1.0
sniffio 1.3.1
sortedcontainers 2.4.0
soundfile 0.13.1
soupsieve 2.7
soxr 0.5.0.post1
spacy 3.7.5
spacy-legacy 3.0.12
spacy-loggers 1.0.5
srsly 2.5.1
stack-data 0.6.3
sympy 1.14.0
tabulate 0.9.0
tbb 2021.13.1
tblib 3.1.0
tensorboard 2.16.2
tensorboard-data-server 0.7.2
tensordict 0.10.0
tensorrt 10.11.0.33
terminado 0.18.1
thinc 8.2.5
threadpoolctl 3.6.0
thriftpy2 0.5.2
tinycss2 1.4.0
toolz 1.0.0
torch 2.8.0a0+5228986c39.nv25.6
torch_tensorrt 2.8.0a0
torchao 0.11.0+git
torchmetrics 1.0.3
torchprofile 0.0.4
torchrec 1.2.0+440b1c6
torchvision 0.22.0a0+95f10a4e
torchx 0.7.0
tornado 6.5.1
tqdm 4.67.1
traitlets 5.14.3
transformer_engine 2.4.0+3cd6870
treelite 4.4.1
typer 0.16.0
types-dataclasses 0.6.6
types-python-dateutil 2.9.0.20250516
typing_extensions 4.14.0
typing-inspect 0.9.0
typing-inspection 0.4.1
tzdata 2025.2
ucx-py 0.43.0
ucxx 0.43.0
uri-template 1.3.0
urllib3 1.26.20
virtualenv 20.36.0
wasabi 1.1.3
wcwidth 0.2.13
weasel 0.4.1
webcolors 24.11.1
webencodings 0.5.1
websocket-client 1.8.0
Werkzeug 3.1.3
wheel 0.45.1
wrapt 1.17.2
xdoctest 1.0.2
xgboost 2.1.4
yarl 1.20.0
zarr 2.18.7
zict 3.0.0
zipp 3.22.0
</pre></details>
Additional context
The results are not only inconsistent but also show large deviations (in some cases nearly inverted), which seems unexpected.
Questions
• Has the consistency between training and inference been recently verified for this example?
• Are there any known pitfalls or required steps when running inference (e.g., feature preprocessing, normalization, checkpoint loading, dynamic embedding handling, etc.)?
Any guidance would be greatly appreciated. Thanks in advance!
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.
Describe the bug
Hi, thanks for open-sourcing this great project!
We recently tried running the ranking task on the KuaiRand-1K dataset, but observed a significant inconsistency between training-time evaluation results and standalone inference results.
Steps/Code to reproduce bug
Training Command
Inference Command
Expected behavior
Training-time Evaluation Results at Iteration 550:
Inference Results Using the Saved Checkpoint:
Environment details (please complete the following information):
docker pull shijieliu01/recsys-examples:2026.1.9for training anddocker pull shijieliu01/recsys-examples:inference.2026.1.14for inference.docker run --gpus all -it --name gr_training shijieliu01/recsys-examples:2026.1.9anddocker run --gpus all -it --name gr_inference shijieliu01/recsys-examples:inference.2026.1.14print_env.shfrom the project root and paste the results here:Additional context
The results are not only inconsistent but also show large deviations (in some cases nearly inverted), which seems unexpected.
Questions
• Has the consistency between training and inference been recently verified for this example?
• Are there any known pitfalls or required steps when running inference (e.g., feature preprocessing, normalization, checkpoint loading, dynamic embedding handling, etc.)?
Any guidance would be greatly appreciated. Thanks in advance!
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.